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Preface 


This document is an instructor’s manual to accompany Introduction to Algorithms, 
Second Edition, by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, 
and Clifford Stein. It is intended for use in a course on algorithms. You might 
also find some of the material herein to be useful for a CS 2-style course in data 
structures. 

Unlike the instructor’s manual for the first edition of the text—which was organized 
around the undergraduate algorithms course taught by Charles Leiserson at MIT 
in Spring 1991—we have chosen to organize the manual for the second edition 
according to chapters of the text. That is, for most chapters we have provided a 
set of lecture notes and a set of exercise and problem solutions pertaining to the 
chapter. This organization allows you to decide how to best use the material in the 
manual in your own course. 

We have not included lecture notes and solutions for every chapter, nor have we 
included solutions for every exercise and problem within the chapters that we have 
selected. We felt that Chapter 1 is too nontechnical to include here, and Chap¬ 
ter 10 consists of background material that often falls outside algorithms and data- 
structures courses. We have also omitted the chapters that are not covered in the 
courses that we teach: Chapters 18-20 and 28-35, as well as Appendices A-C; 
future editions of this manual may include some of these chapters. There are two 
reasons that we have not included solutions to all exercises and problems in the 
selected chapters. First, writing up all these solutions would take a long time, and 
we felt it more important to release this manual in as timely a fashion as possible. 
Second, if we were to include all solutions, this manual would be longer than the 
text itself! 

We have numbered the pages in this manual using the format CC-PP, where CC 
is a chapter number of the text and PP is the page number within that chapter’s 
lecture notes and solutions. The PP numbers restart from 1 at the beginning of each 
chapter’s lecture notes. We chose this form of page numbering so that if we add 
or change solutions to exercises and problems, the only pages whose numbering is 
affected are those for the solutions for that chapter. Moreover, if we add material 
for currently uncovered chapters, the numbers of the existing pages will remain 
unchanged. 

The lecture notes 


The lecture notes are based on three sources: 
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• Some are from the first-edition manual, and so they correspond to Charles Leis- 
erson’s lectures in MIT’s undergraduate algorithms course, 6.046. 

• Some are from Tom Cormen’s lectures in Dartmouth College’s undergraduate 
algorithms course, CS 25. 

• Some are written just for this manual. 

You will find that the lecture notes are more informal than the text, as is appro¬ 
priate for a lecture situation. In some places, we have simplified the material for 
lecture presentation or even omitted certain considerations. Some sections of the 
text—usually starred—are omitted from the lecture notes. (We have included lec¬ 
ture notes for one starred section: 12.4, on randomly built binary search trees, 
which we cover in an optional CS 25 lecture.) 

In several places in the lecture notes, we have included “asides” to the instruc¬ 
tor. The asides are typeset in a slanted font and are enclosed in square brack¬ 
ets. [Here is an aside.] Some of the asides suggest leaving certain material on the 
hoai'd, since you will be coming back to it later. If you are projecting a presenta¬ 
tion rather than writing on a blackboard or whiteboard, you might want to mark 
slides containing this material so that you can easily come back to them later in the 
lecture. 

We have chosen not to indicate how long it takes to cover material, as the time nec¬ 
essary to cover a topic depends on the instructor, the students, the class schedule, 
and other variables. 

There are two differences in how we write pseudocode in the lecture notes and the 
text: 

• Lines are not numbered in the lecture notes. We find them inconvenient to 
number when writing pseudocode on the board. 

• We avoid using the length attribute of an array. Instead, we pass the array 
length as a parameter to the procedure. This change makes the pseudocode 
more concise, as well as matching better with the description of what it does. 

We have also minimized the use of shading in figures within lecture notes, since 
drawing a figure with shading on a blackboard or whiteboard is difficult. 


The solutions 

The solutions are based on the same sources as the lecture notes. They are written 
a bit more formally than the lecture notes, though a bit less formally than the text. 
We do not number lines of pseudocode, but we do use the length attribute (on the 
assumption that you will want your students to write pseudocode as it appeal's in 
the text). 

The index lists all the exercises and problems for which this manual provides solu¬ 
tions, along with the number of the page on which each solution starts. 

Asides appeal' in a handful of places throughout the solutions. Also, we are less 
reluctant to use shading in figures within solutions, since these figures are more 
likely to be reproduced than to be drawn on a board. 
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Source files 

For several reasons, we are unable to publish or transmit source files for this man¬ 
ual. We apologize for this inconvenience. 

In June 2003, we made available a clrscode package for BTpX 2 £ . It enables 
you to typeset pseudocode in the same way that we do. You can find this package 
at http://www.cs.dartmouth.edu/~thc/clrscode/. That site also 
includes documentation. 


Reporting errors and suggestions 

Undoubtedly, instructors will find errors in this manual. Please report errors by 
sending email to clrs-manual-bugs@mhhe . com 

If you have a suggestion for an improvement to this manual, please feel free to 
submit it via email to clrs-manual-suggestions@mhhe . com 
As usual, if you find an error in the text itself, please verify that it has not already 
been posted on the errata web page before you submit it. You can use the MIT 
Press web site for the text, http : //mitpress . mit. edu/algorithms/, to 
locate the errata web page and to submit an error report. 

We thank you in advance for your assistance in correcting errors in both this manual 
and the text. 
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Lecture Notes for Chapter 2: 
Getting Started 


Chapter 2 overview 

Goals: 

• Start using frameworks for describing and analyzing algorithms. 

• Examine two algorithms for sorting: insertion sort and merge sort. 

• See how to describe algorithms in pseudocode. 

• Begin using asymptotic notation to express running-time analysis. 

• Learn the technique of “divide and conquer” in the context of merge sort. 


Insertion sort 

The sorting problem 

Input: A sequence of n numbers {a\,a 2 ,... ,a n ). 

Output: A permutation (reordering) (a[, a' 2 , ..., a' n } of the input sequence such 
that a[ < a' 2 < ■ ■ ■ < a' n . 

The sequences are typically stored in arrays. 

We also refer to the numbers as keys. Along with each key may be additional 
information, known as satellite data. [You might want to clarify that “satellite 
data” does not necessarily come from a satellite!] 

We will see several ways to solve the sorting problem. Each way will be expressed 
as an algorithm-, a well-defined computational procedure that takes some value, or 
set of values, as input and produces some value, or set of values, as output. 


Expressing algorithms 

We express algorithms in whatever way is the clearest and most concise. 

English is sometimes the best way. 

When issues of control need to be made perfectly clear, we often use pseudocode. 
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• Pseudocode is similar to C, C++, Pascal, and Java. If you know any of these 
languages, you should be able to understand pseudocode. 

• Pseudocode is designed for expressing algorithms to humans. Software en¬ 
gineering issues of data abstraction, modularity, and error handling are often 
ignored. 

• We sometimes embed English statements into pseudocode. Therefore, unlike 
for “real” programming languages, we cannot create a compiler that translates 
pseudocode to machine code. 


Insertion sort 

A good algorithm for sorting a small number of elements. 

It works the way you might sort a hand of playing cards: 

• Start with an empty left hand and the cards face down on the table. 

• Then remove one card at a time from the table, and insert it into the correct 

position in the left hand. 

• To find the correct position for a card, compare it with each of the cards already 
in the hand, from right to left. 

• At all times, the cards held in the left hand are sorted, and these cards were 
originally the top cards of the pile on the table. 

Pseudocode: We use a procedure Insertion-Sort. 

• Takes as parameters an array A[1 .. n\ and the length n of the array. 

• As in Pascal, we use “..” to denote a range within an array. 

• [We usually use 1-origin indexing, as we do here. There are a few places in 
later chapters where we use 0-origin indexing instead. If you are translating 
pseudocode to C, C++, or Java, which use 0-origin indexing, you need to be 
careful to get the indices right. One option is to adjust all index calculations 
in the C, C++, or Java code to compensate. An easier option is, when using an 
array A[1 .. n \, to allocate the array to be one entry longer—A[ 0 .. n \ —and just 
don’t use the entry at index 0. j 

• [In the lecture notes, we indicate array lengths by parameters rather than by 
using the length attribute that is used in the book. That saves us a line of pseu¬ 
docode each time. The solutions continue to use the length attribute.] 

• The array A is sorted in place : the numbers are rearranged within the array, 
with at most a constant number outside the array at any time. 
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Insertion-Sort (A) 

for j <— 2 to n 

do key A[j] 

> Insert A\j\ into the sorted sequence A[1 .. j — 1]. 

i j ~ 1 

while i > 0 and A \i J > key 
do A\i + 1] A[i] 
i <— i — 1 
A[i + !]-<— key 


cost 

times 

C\ 

n 

C2 

n — 1 

0 

n — 1 

C\ 

n — 1 

C 5 

E" =2 tj 

C6 

<N 

II 

K *->» 

w 

Cl 

E- =2 (0 

C& 

n — 1 


[Leave this on the board. but show only the pseudocode for now. We’ll put in the 
“cost” and “times” columns later.] 


Example: 


j j j 

1 , 2 I 3 , 4 , 5 , 6 1 , 2 , 3 I 4 , 5 , 6 , 1 , 2 , 3 , 4 I 5 , 6 


5 2 4 6 1 3 

2 5 4 6 1 3 

2 4 5 6 1 3 


A W " 

V 

j 

1 2 3 4 5 6 

j 

1 2 3 4 5 6 

1 2 3 4 5 6 

2 4 5 6 1 3 

1 2 4 5 6 3 

1 2 3 4 5 6 

WWJif/ 

WA 



[Read this figure row by row. Each part shows what happens for a particular itera¬ 
tion with the value of j indicated, j indexes the “current card” being inserted into 
the hand. Elements to the left of A[j ] that are greater than A[j ] move one position 
to the right, and A[j] moves into the evacuated position. The heavy vertical lines 
separate the part of the array in which an iteration works —A\ I .. j ] —from the part 
of the array that is unaffected by this iteration—A[j + I .. n\. The last part of the 
figure shows the final sorted array.] 


Correctness 

We often use a loop invariant to help us understand why an algorithm gives the 
correct answer. Here’s the loop invariant for Insertion-Sort: 

Loop invariant: At the start of each iteration of the “outer” for loop—the 
loop indexed by j —the subarray A[1 .. j — 1] consists of the elements orig¬ 
inally in A[1.. j — 1] but in sorted order. 

To use a loop invariant to prove correctness, we must show three things about it: 

Initialization: It is true prior to the first iteration of the loop. 

Maintenance: If it is true before an iteration of the loop, it remains true before the 
next iteration. 

Termination: When the loop terminates, the invariant—usually along with the 
reason that the loop terminated—gives us a useful property that helps show that 
the algorithm is correct. 

Using loop invariants is like mathematical induction: 









2-4 


Lecture Notes for Chapter 2: Getting Started 


• To prove that a property holds, you prove a base case and an inductive step. 

• Showing that the invariant holds before the first iteration is like the base case. 

• Showing that the invariant holds from iteration to iteration is like the inductive 
step. 

• The termination part differs from the usual use of mathematical induction, in 
which the inductive step is used infinitely. We stop the “induction” when the 
loop terminates. 

• We can show the three parts in any order. 

For insertion sort: 

Initialization: Just before the first iteration, j = 2. The subarray A[1.. j — 1] 
is the single element A[l], which is the element originally in A[l], and it is 
trivially sorted. 

Maintenance: To be precise, we would need to state and prove a loop invariant 
for the “inner” while loop. Rather than getting bogged down in another loop 
invariant, we instead note that the body of the inner while loop works by moving 
A[j — 1], A[j — 2], A[j — 3], and so on, by one position to the right until the 
proper position for key (which has the value that started out in A[j |) is found. 
At that point, the value of key is placed into this position. 

Termination: The outer for loop ends when j > n; this occurs when j — n + 1. 
Therefore, j — 1 — n. Plugging n in for j — 1 in the loop invariant, the subarray 
A[1 .. n\ consists of the elements originally in A[1 .. n\ but in sorted order. In 
other words, the entire array is sorted! 

Pseudocode conventions 

[Covering most, but not all, here. See book pages 19-20 for all conventions.] 

• Indentation indicates block structure. Saves space and writing time. 

• Looping constructs are like in C, C++, Pascal, and Java. We assume that the 
loop variable in a for loop is still defined when the loop exits (unlike in Pascal). 

• “D>” indicates that the remainder of the line is a comment. 

• Variables are local, unless otherwise specified. 

• We often use objects, which have attributes (equivalently, fields). For an at¬ 
tribute attr of object x, we write attr[x], (This would be the equivalent of 
x.attr in Java or x - > attr in C++.) 

• Objects are treated as references, like in Java. If x and y denote objects, then 
the assignment y <— x makes x and y reference the same object. It does not 
cause attributes of one object to be copied to another. 

• Parameters are passed by value, as in Java and C (and the default mechanism in 
Pascal and C++). When an object is passed by value, it is actually a reference 
(or pointer) that is passed; changes to the reference itself are not seen by the 
caller, but changes to the object’s attributes are. 

• The boolean operators “and” and “or” are short-circuiting-, if after evaluating 
the left-hand operand, we know the result of the expression, then we don’t 
evaluate the right-hand operand. (If x is FALSE in “x and y” then we don’t 
evaluate y. If jc is TRUE in “x or y” then we don’t evaluate y.) 
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Analyzing algorithms 

We want to predict the resources that the algorithm requires. Usually, running time. 
In order to predict resource requirements, we need a computational model. 

Random-access machine (RAM) model 

• Instructions are executed one after another. No concurrent operations. 

• It’s too tedious to define each of the instructions and their associated time costs. 

• Instead, we recognize that we’ll use instructions commonly found in real com¬ 
puters: 

• Arithmetic: add, subtract, multiply, divide, remainder, floor, ceiling). Also, 
shift left/shift right (good for multiplying/dividing by i). 

• Data movement: load, store, copy. 

• Control: conditional/unconditional branch, subroutine call and return. 

Each of these instructions takes a constant amount of time. 

The RAM model uses integer and floating-point types. 

• We don’t worry about precision, although it is crucial in certain numerical ap¬ 
plications. 

• There is a limit on the word size: when working with inputs of size n, assume 
that integers are represented by c lg n bits for some constant c > 1. (lg n is a 
very frequently used shorthand for log, n.) 

• c > 1 A we can hold the value of n =>• we can index the individual elements. 

• c is a constant => the word size cannot grow arbitrarily. 


How do we analyze an algorithm’s running time? 

The time taken by an algorithm depends on the input. 

• Sorting 1000 numbers takes longer than sorting 3 numbers. 

• A given sorting algorithm may even take differing amounts of time on two 
inputs of the same size. 

• For example, we’ll see that insertion sort takes less time to sort n elements when 
they are already sorted than when they are in reverse sorted order. 

Input size: Depends on the problem being studied. 

• Usually, the number of items in the input. Like the size n of the array being 
sorted. 

• But could be something else. If multiplying two integers, could be the total 
number of bits in the two integers. 

• Could be described by more than one number. For example, graph algorithm 
running times are usually expressed in terms of the number of vertices and the 
number of edges in the input graph. 
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Running time: On a particular input, it is the number of primitive operations 
(steps) executed. 

• Want to define steps to be machine-independent. 

• Figure that each line of pseudocode requires a constant amount of time. 

• One line may take a different amount of time than another, but each execution 
of line i takes the same amount of time q. 

• This is assuming that the line consists only of primitive operations. 

• If the line is a subroutine call, then the actual call takes constant time, but the 
execution of the subroutine being called might not. 

• If the line specifies operations other than primitive ones, then it might take 
more than constant time. Example: “sort the points by v-coordinate.” 


Analysis of insertion sort 

[Now add statement costs and number of times executed to Insertion-Sort 
pseudocode.] 

• Assume that the i th line takes time q , which is a constant. (Since the third line 
is a comment, it takes no time.) 

• For j = 2, 3, ..., n, let tj be the number of times that the while loop test is 
executed for that value of j. 

• Note that when a for or while loop exits in the usual way—due to the test in the 
loop header—the test is executed one time more than the loop body. 

The running time of the algorithm is 

(cost of statement) • (number of times statement is executed) . 

all statements 

Let T(n) = running time of Insertion-Sort. 

n n 

T(n) = c\ti + c 2 (n - 1) + c 4 (n - I) + c 5 ^ tj + c 6 Y^itj - 1) 

7 =2 7=2 

n 

+ c~j - 1 ) + eg (n - 1 ) . 

7=2 

The running time depends on the values of tj. These vary according to the input. 
Best case: The array is already sorted. 

• Always find that A [/' j < key upon the first time the while loop test is run (when 

i = j ~ !)• 

• All tj are 1. 

• Running time is 

T (n) = cpt + Ci in — 1) + c^{n — 1) + cs(n — 1) + eg (n — 1) 

= (ci T C 2 + C 4 + C 5 + cg)n — {ci + C 4 + C 5 + Cg) . 

• Can express T (n) as an + b for constants a and b (that depend on the statement 
costs a) => T (n) is a linear function of n. 
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Worst case: The array is in reverse sorted order. 

• Always find that A\i\ > key in while loop test. 

• Have to compare key with all elements to the left of the j th position => compare 
with j — 1 elements. 

• Since the while loop exits because i reaches 0, there’s one additional test after 
the j — 1 tests =>• tj = j. 

n n n n 

• E 1 j = E j and - 1} = Eo- - l) - 

7=2 7=2 7=2 7=2 

n 

• j is known as an arithmetic series, and equation (A. 1) shows that it equals 

7 = 1 

n(n + 1) 


Since E' = E j j — 1 , it equals 


n(ri + 1 ) 


1 . 


7=2 


J =1 


[The parentheses around the summation are not strictly necessary. They are 
there for clarity, but it might be a good idea to remind the students that the 
meaning of the expression would be the same even without the parentheses.] 


n— 1 


Letting k = j — 1, we see that — 1) = k 

7=2 

Running time is 


n(n — 1 ) 


k= 1 


T (n) = ci« + C 2 (« — 1) + C\ (n — 1) + C 5 


n(n + 1 ) 


+ c 6 r-rigll) + Cl ("±gir ]+cAn - l) 


( c 5 , c 6 , c 7 \ 2 , / , , , c 5 c 6 Ci \ 

(t + T + 2)" +b l +« + <' 4 + 7-y- - + c»)n 


— (C2 + C4 + C5 + Cg) . 

Can express T(n) as an 2 + bn + c for constants a,b,c (that again depend on 
statement costs) => T (n) is a quadratic function of n. 


Worst-case and average-case analysis 

We usually concentrate on finding the worst-case running time', the longest run¬ 
ning time for any input of size n. 

Reasons: 

• The worst-case running time gives a guaranteed upper bound on the running 
time for any input. 

• For some algorithms, the worst case occurs often. For example, when search¬ 
ing, the worst case often occurs when the item being searched for is not present, 
and searches for absent items may be frequent. 

• Why not analyze the average case? Because it’s often about as bad as the worst 


case. 
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Example: Suppose that we randomly choose n numbers as the input to inser¬ 
tion sort. 

On average, the key in A[j ] is less than half the elements in A[1.. j — 1] and 
it’s greater than the other half. 

=> On average, the while loop has to look halfway through the sorted subarray 
A[1 .. j — 1] to decide where to drop key. 
tj = j / 2. 

Although the average-case running time is approximately half of the worst-case 
running time, it’s still a quadratic function of n. 


Order of growth 

Another abstraction to ease analysis and focus on the important features. 

Look only at the leading term of the formula for running time. 

• Drop lower-order terms. 

• Ignore the constant coefficient in the leading term. 

Example: For insertion sort, we already abstracted away the actual statement costs 
to conclude that the worst-case running time is atr + bn + c. 

Drop lower-order terms => an 2 . 

Ignore constant coefficient =>• n 2 . 

But we cannot say that the worst-case running time T(n) equals tr. 

It grows like n 2 . But it doesn’t equal n 2 . 

We say that the running time is © (n 2 ) to capture the notion that the order of growth 
is n 2 . 

We usually consider one algorithm to be more efficient than another if its worst- 
case running time has a smaller order of growth. 


Designing algorithms 

There are many ways to design algorithms. 

For example, insertion sort is incremental, having sorted A[1 .. j — 1], place A\j \ 
correctly, so that A[1 .. j 1 is sorted. 


Divide and conquer 

Another common approach. 

Divide the problem into a number of subproblems. 

Conquer the subproblems by solving them recursively. 

Base case: If the subproblems are small enough, just solve them by brute force. 

[It would be a good idea to make sure that your students are comfortable with 
recursion. If they are not, then they will have a hard time understanding divide 
and conquer.] 

Combine the subproblem solutions to give a solution to the original problem. 
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Merge sort 

A sorting algorithm based on divide and conquer. Its worst-case running time has 
a lower order of growth than insertion sort. 

Because we are dealing with subproblems, we state each subproblem as sorting 
a subarray A\p .. r]. Initially, p = 1 and r = n, but these values change as we 
recurse through subproblems. 

To sort A[p ../-]: 

Divide by splitting into two subarrays A\p .. q ] and A\q + 1.. r], where q is the 
halfway point of A[p .. r]. 

Conquer by recursively sorting the two subarrays A\p .. q ] and A\q + 1.. r]. 
Combine by merging the two sorted subarrays A\p .. q ] and A\q + 1 .. r] to pro¬ 
duce a single sorted subarray A\ p .. r]. To accomplish this step, we’ll define a 
procedure Merge(A, p , q, r). 

The recursion bottoms out when the subarray has just 1 element, so that it’s trivially 
sorted. 

Merge-Sort(A, p, r ) 
if p < r 

then q Up + r)/2\ 

Merge-Sort(A, p, q) 

MERGE-SORT(A, q + 1, r) 

MERGE(A, p, q, r) 

Initial call: Merge-Sort (A, I. n) 

[It is astounding how often students forget how easy it is to compute the halfway 
point of p and r as their average (p + r)/2. We of course have to take the door 
to ensure that we get an integer index q. But it is common to see students perform 
calculations like p + (r — p)/ 2, or even more elaborate expressions, forgetting the 
easy way to compute an average.] 


> Check for base case 

> Divide 

> Conquer 

> Conquer 

> Combine 
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[Examples when n is a power of 2 are most straightforward, but students might 
also want an example when n is not a power of 2.] 

Bottom-up view for n = 11: 
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[Here, at the next-to-last level of recursion, some of the subproblems have only 1 
element. The recursion bottoms out on these single-element subproblems.] 


Merging 

What remains is the Merge procedure. 

Input: Array A and indices p, q,r such that 

• p < q < r. 

• Subarray A\p . ,q\ is sorted and subarray A\q + 1 . .r] is sorted. By the 
restrictions on p,q,r, neither subarray is empty. 

Output: The two subarrays are merged into a single sorted subarray in A\p .. r]. 

We implement it so that it takes 0(n) time, where n — r — p + \ — the number of 
elements being merged. 

What is n? Until now, n has stood for the size of the original problem. But now 
we’re using it as the size of a subproblem. We will use this technique when we 
analyze recursive algorithms. Although we may denote the original problem size 
by n, in general n will be the size of a given subproblem. 

Idea behind linear-time merging: Think of two piles of cards. 

• Each pile is sorted and placed face-up on a table with the smallest cards on top. 

• We will merge these into a single sorted pile, face-down on the table. 

• A basic step: 

• Choose the smaller of the two top cards. 
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• Remove it from its pile, thereby exposing a new top card. 

• Place the chosen card face-down onto the output pile. 

• Repeatedly perform basic steps until one input pile is empty. 

• Once one input pile empties, just take the remaining input pile and place it 
face-down onto the output pile. 

• Each basic step should take constant time, since we check just the two top cards. 

• There are < n basic steps, since each basic step removes one card from the 
input piles, and we started with n cards in the input piles. 

• Therefore, this procedure should take 0(«) time. 

We don’t actually need to check whether a pile is empty before each basic step. 

• Put on the bottom of each input pile a special sentinel card. 

• It contains a special value that we use to simplify the code. 

• We use oo, since that’s guaranteed to “lose” to any other value. 

• The only way that oo cannot lose is when both piles have oo exposed as their 
top cards. 

• But when that happens, all the nonsentinel cards have already been placed into 
the output pile. 

• We know in advance that there are exactly r — p + 1 nonsentinel cards =>■ stop 
once we have performed r — p + 1 basic steps. Never a need to check for 
sentinels, since they’ll always lose. 

• Rather than even counting basic steps, just till up the output array from index p 
up through and including index r. 

Pseudocode: 

MERGE(A, p, q , r ) 
n \ q - p + 1 
n 2 <— r — q 

create arrays L[1 .. ti\ + 1] and R\ I .. n 2 + 1] 
for / ^— 1 to ti\ 

do L[i] A[p -hi — 1] 
for j 1 to «2 

do R[j] <- A[q + j ] 

L\ti\ + 1] oo 
R[n 2 + 1] oo 
i <— 1 

j 1 

for k 4— p to r 

do if L[i] < R[j] 

then A\k\ L[i ] 
i — / T 1 
else A\k\ «- R\j\ 

j ^j + 1 

[The book uses a loop invariant to establish that Merge works correctly. In a 
lecture situation, it is probably better to use an example to show that the procedure 
works correctly.] 
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Example: A call of Merge(9, 12, 16) 



12 3 45 12345 12345 12345 



12 3 45 1234 5 12 3 45 1234 5 



12345 12345 12345 1234 5 



j i j 


8 9 10 11 12 13 14 15 16 17 



j 


[Read this figure row by row. The first part shows the arrays at the start of the 
“for k ■<— p tor ’’loop, where A[p .. q ] is copied into L[ 1.. n\ | and A[q +1 .. r] is 
copied into /?[1 .. n 2 \. Succeeding parts show the situation at the start of successive 
iterations. Entries in A with slashes have had their values copied to either L or R 
and have not had a value copied back in yet. Entries in L and R with slashes have 
been copied back into A. The last part shows that the subarrays are merged back 
into A[p .. r], which is now sorted, and that only the sentinels (bo ) are exposed in 
the arrays L and R.] 

Running time: The first two for loops take 0(«i + ni) = 0(«) time. The last for 
loop makes n iterations, each taking constant time, for Q(n) time. 

Total time: ©(«). 
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Analyzing divide-and-conquer algorithms 

Use a recurrence equation (more commonly, a recurrence ) to describe the running 
time of a divide-and-conquer algorithm. 

Let T (n) = running time on a problem of size n. 

• If the problem size is small enough (say, n < c for some constant c), we have a 
base case. The brute-force solution takes constant time: 0(1). 

• Otherwise, suppose that we divide into a subproblems, each I /b the size of the 
original. (In merge sort, a = b = 2.) 

• Let the time to divide a size-n problem be D(n). 

• There are a subproblems to solve, each of size n/b => each subproblem takes 
T(n/b) time to solve => we spend uT(n/b) time solving subproblems. 

• Let the time to combine solutions be C (n). 

• We get the recurrence 

rtnt = I 0(1) if n < c , 

{ aT(n/b) + D(n) + C (n) otherwise . 


Analyzing merge sort 


For simplicity, assume that n is a power of 2 =>■ each divide step yields two sub¬ 
problems, both of size exactly n/2. 

The base case occurs when n = 1. 

When n > 2, time for merge sort steps: 

Divide: Just compute q as the average of p and r => D(n) = 0(1). 

Conquer: Recursively solve 2 subproblems, each of size n/2 => 2T(n/2). 
Combine: Merge on an n-element subarray takes 0(n) time =>• C(n) = 0(«). 


Since D(n) = 0(1) and Cin) = ©(«), summed together they give a function that 
is lineal - in n: 0 (n) => recurrence for merge sort running time is 


T(n) = 


r ©cl) 

\2T (n/2) + @(n) 


if n = 1 , 
if n > 1 . 


Solving the merge-sort recurrence: By the master theorem in Chapter 4, we can 
show that this recurrence has the solution T(n) = &(nlgn). [Reminder: Ig n 
stands for log 2 n.] 

Compai'ed to insertion sort (0 (/r) worst-case time), merge sort is faster. Trading 
a factor of n for a factor of lg n is a good deal. 

On small inputs, insertion sort may be faster. But for large enough inputs, merge 
sort will always be faster, because its running time grows more slowly than inser¬ 
tion sort’s. 

We can understand how to solve the merge-sort recurrence without the master the¬ 


orem. 



2-14 


Lecture Notes for Chapter 2: Getting Started 


• Let c be a constant that describes the running time for the base case and also 
is the time per array element for the divide and conquer steps. [Of course, we 
cannot necessarily use the same constant for both. It’s not worth going into this 
detail at this point.] 

• We rewrite the recurrence as 

r , Jc if n = 1 , 

j [2T(n/2) + cn if n > 1 . 

• Draw a recursion tree, which shows successive expansions of the recurrence. 

• For the original problem, we have a cost of cn, plus the two subproblems, each 
costing T(n/2): 



T(nl 2) T(nl2) 

• For each of the size-n/2 subproblems, we have a cost of cn/ 2, plus two sub¬ 
problems, each costing T (n/4): 


cn 



T(n/4) T(nl4) T(nl 4) T(nl 4) 

• Continue expanding until the problem sizes get down to 1: 



n 


Total: cn lg n + cn 
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• Each level has cost cn. 

• The top level has cost cn. 

• The next level down has 2 subproblems, each contributing cost cn/2. 

• The next level has 4 subproblems, each contributing cost cn/A. 

• Each time we go down one level, the number of subproblems doubles but the 
cost per subproblem halves =>• cost per level stays the same. 

• There are Ig n + I levels (height is lg n). 

• Use induction. 

• Base case: n — 1 =>■ 1 level, and lgl + l= 0 + l = l. 

• Inductive hypothesis is that a tree for a problem size of 2 has lg 2' +1 = i +1 
levels. 

• Because we assume that the problem size is a power of 2, the next problem 
size up after 2' is 2' +1 . 

• A tree for a problem size of 2' +1 has one more level than the size-2' tree => 
i + 2 levels. 

• Since lg 2' +1 + 1 = i + 2, we’re done with the inductive argument. 

• Total cost is sum of costs at each level. Have lg n + 1 levels, each costing cn =>• 

total cost is cn lg n + cn. 

• Ignore low-order term of cn and constant coefficient c => &(n lg n). 



Solutions for Chapter 2: 
Getting Started 


Solution to Exercise 2.2-2 

Selection-Sort (A) 
n <— length\A] 
for j ^— 1 to /i — 1 
do smallest j 
for i <— j + 1 to n 

do if A\i\ < A[smallest] 
then smallest i 
exchange A[j] -o- A[smallest ] 

The algorithm maintains the loop invariant that at the start of each iteration of the 
outer for loop, the subarray A[1.. j — 1] consists of the j — 1 smallest elements 
in the array A[1 .. ri], and this subarray is in sorted order. After the first n — 1 
elements, the subarray A[1 .. n — 1] contains the smallest n — 1 elements, sorted, 
and therefore element A[n\ must be the largest element. 

The running time of the algorithm is 10(n 2 ) for all cases. 


Solution to Exercise 2.2-4 

Modify the algorithm so it tests whether the input satisfies some special-case con¬ 
dition and, if it does, output a pre-computed answer. The best-case running time is 
generally not a good measure of an algorithm. 


Solution to Exercise 2.3-3 


The base case is when n = 2, and we have n lg n = 2 lg 2 = 2 • 1 = 2. 
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For the inductive step, our inductive hypothesis is that T(n/2) = («/2) lg(n/2). 

Then 

T(n) = 2T(n/2) + n 

= 2(n/2)lg(n/2) + n 

= n(\g n — \ ) + n 
= n lg n — n + n 
= n lg n , 

which completes the inductive proof for exact powers of 2. 


Solution to Exercise 2.3-4 


Since it takes @(«) time in the worst case to insert A\n\ into the sorted array 
A[ 1 .. n — 1], we get the recurrence 


, {©( 1 ) 

K) |r(«-l) + 0(«) ifn > 1 . 


if n — 1 , 

The solution to this recurrence is T (n) = 0(« 2 ). 


Solution to Exercise 2.3-5 

Procedure Binary-Search takes a sorted array A, a value v, and a range 
[low .. high ] of the array, in which we search for the value v. The procedure com¬ 
pares v to the array entry at the midpoint of the range and decides to eliminate half 
the range from further consideration. We give both iterative and recursive versions, 
each of which returns either an index i such that A\i\ = v, or NIL if no entry of 
A[low.. high] contains the value v. The initial call to either version should have 
the parameters A, v, 1, n. 

Iterative-Binary-Search (A, v, low, high ) 
while low < high 

do mid <— [(low + high) /2J 
if v = A [mid] 

then return mid 

if v > A [ mid \ 
then low <r- mid +1 
else high ■*— mid — 1 

return NIL 
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Recursive-Binary-Search (A, v, low, high) 
if low > high 

then return NIL 

mid <— \_(low + high)/2\ 
if v = A[mid] 

then return mid 

if v > A[mid ] 

then return Recursive-Binary-Search (A, v, mid + 1, high) 
else return Recursive-Binary-Search (A, v, low, mid -1) 

Both procedures terminate the search unsuccessfully when the range is empty (i.e., 
low > high) and terminate it successfully if the value v has been found. Based 
on the comparison of v to the middle element in the searched range, the search 
continues with the range halved. The recurrence for these procedures is therefore 
T{n) = T ( n/2) + 0(1), whose solution is T(n) = 0(lg n). 


Solution to Exercise 2.3-6 

The while loop of lines 5-7 of procedure Insertion-Sort scans backward 
through the sorted array A[1 .. j — 1] to find the appropriate place for A\ j\. The 
hitch is that the loop not only searches for the proper place for A[j], but that it also 
moves each of the array elements that are bigger than A\ j\ one position to the right 
(line 6). These movements can take as much as ©(/) time, which occurs when all 
the / — I elements preceding A[j ] are larger than A\ j\. We can use binary search 
to improve the running time of the search to 0(lg j), but binary search will have no 
effect on the running time of moving the elements. Therefore, binary search alone 
cannot improve the worst-case running time of Insertion-Sort to 0(7; Ig n). 


Solution to Exercise 2.3-7 

The following algorithm solves the problem: 

1. Sort the elements in S. 

2. Form the set S' = {z : z = x — y for some y e S}. 

3. Sort the elements in S'. 

4. If any value in S appears more than once, remove all but one instance. Do the 
same for S'. 

5. Merge the two sorted sets S and S'. 

6. There exist two elements in S whose sum is exactly ,r if and only if the same 
value appears in consecutive positions in the merged output. 

To justify the claim in step 4, first observe that if any value appears twice in the 
merged output, it must appear in consecutive positions. Thus, we can restate the 
condition in step 5 as there exist two elements in S whose sum is exactly .r if and 
only if the same value appears twice in the merged output. 
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Suppose that some value w appeal's twice. Then w appeared once in S and once 
in S'. Because w appeared in S', there exists some y e S such that w = x — y, or 
x = w + y. Since w e S, the elements w and y are in S and sum to x. 

Conversely, suppose that there are values w, y e S such that w + y = x. Then, 
since x — y — w, the value w appeal's in S'. Thus, w is in both S and S', and so it 
will appeal' twice in the merged output. 

Steps 1 and 3 require 0(n lg n) steps. Steps 2, 4, 5, and 6 require 0(n) steps. Thus 
the overall running time is 0(n lg n). 


Solution to Problem 2-1 

[It may be better to assign this problem after covering asymptotic notation in Sec¬ 
tion 3.1; otherwise part (c) may be too difficult.] 

a. Insertion sort takes @(k 2 ) time per £-element list in the worst case. Therefore, 
sorting n/k lists of k elements each takes ®(lcn/k ) = Q(nk ) worst-case time. 

b. Just extending the 2-list merge to merge all the lists at once would take 
0(n • (n/k)) = &(n 2 /k) time (n from copying each element once into the 
result list, n/k from examining n/k lists at each step to select next item for 
result list). 

To achieve ®(n lg(n/k ))-timc merging, we merge the lists pairwise, then merge 
the resulting lists pairwise, and so on, until there’s just one list. The pairwise 
merging requires (-)(«) work at each level, since we are still working on n el¬ 
ements, even if they are partitioned among sublists. The number of levels, 
starting with n/k lists (with k elements each) and finishing with 1 list (with n 
elements), is \\g(n/k)~\. Therefore, the total running time for the merging is 
®(nlg(n/k)). 

c. The modified algorithm has the same asymptotic running time as standard 
merge sort when ®(nk + n lg (n/k)) = 0 (n lg n). The largest asymptotic value 
of k as a function of n that satisfies this condition is k = 0(lg /? ). 

To see why, first observe that k cannot be more than 0(lg n) (i.e., it can’t have 
a higher-order term than lg n), for otherwise the left-hand expression wouldn’t 
be Q(/i lg n) (because it would have a higher-order term than n lg n). So all 
we need to do is verify that k = 0(lgn) works, which we can do by plugging 
k = lg n into 0 (nk + n lg (n/k)) = 0 (nk + n lg n — n \gk) to get 

0 (n lg n + n lg n — n lg lg n ) — 0 (2 n lg n — n lg lg n) , 

which, by taking just the high-order term and ignoring the constant coefficient, 
equals 0(n lgn). 

d. In practice, k should be the largest list length on which insertion sort is faster 
than merge sort. 
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Solution to Problem 2-2 

a. We need to show that the elements of A form a permutation of the elements 
of A. 

b. Loop invariant: At the start of each iteration of the for loop of lines 2—4, 

A\ j\ = min \A\k\ : j <k < n) and the subarray A\ j .. n\ is a permuta¬ 
tion of the values that were in A[j .. n\ at the time that the loop stalled. 

Initialization: Initially, j = n, and the subarray A\ j .. n\ consists of single 
element A\n\. The loop invariant trivially holds. 

Maintenance: Consider an iteration for a given value of j. By the loop in¬ 
variant, A[j] is the smallest value in A\ j .. n \. Lines 3-4 exchange A\ j\ 
and A[j — 1] if A[j ] is less than A[j — 1], and so A[j — 1] will be the 
smallest value in A[j — 1.. n\ afterward. Since the only change to the sub- 
array A\ j — 1 .. n\ is this possible exchange, and the subarray A[j .. n\ is 
a permutation of the values that were in A[j .. n\ at the time that the loop 
started, we see that A[j — 1.. n | is a permutation of the values that were in 
A[j — 1.. n\ at the time that the loop started. Decrementing j for the next 
iteration maintains the invariant. 

Termination: The loop terminates when j reaches i. By the statement of the 
loop invariant, A[i] = min [A\k\ : i < k < n) and A[i .. n ] is a permutation 
of the values that were in A\i .. n | at the time that the loop started. 

c. Loop invariant: At the start of each iteration of the for loop of lines 1—4, 
the subarray A[1 .. i — 1] consists of the i — 1 smallest values originally in 
A[1.. n], in sorted order, and A[i .. n\ consists of the n — i + 1 remaining 
values originally in A[1.. /?]. 

Initialization: Before the first iteration of the loop, i — 1. The subarray 
A[1 .. / — 1] is empty, and so the loop invariant vacuously holds. 
Maintenance: Consider an iteration for a given value of i. By the loop invari¬ 
ant, A[1.. i — 1] consists of the i smallest values in A[1 .. n \, in sorted order. 
Part (b) showed that after executing the for loop of lines 2—4, A[i] is the 
smallest value in A\i ..«], and so A[1 .. i \ is now the i smallest values orig¬ 
inally in A[1.. n], in sorted order. Moreover, since the for loop of lines 2—4 
permutes A\i .. n |, the subarray A[i + 1.. n \ consists of the n — i remaining 
values originally in A[1.. n\. 

Termination: The for loop of lines 1—4 terminates when i — n + 1, so that 
i — 1 = n. By the statement of the loop invariant, A[1.. i — 1] is the entire 
array A[1 .. n \, and it consists of the original array A[1 .. n \, in sorted order. 

Note: We have received requests to change the upper bound of the outer for 
loop of lines 1-4 to length [ A ] — 1. That change would also result in a correct 
algorithm. The loop would terminate when i = n, so that according to the loop 
invariant, A[1.. n — 1] would consist of the n — 1 smallest values originally 
in A[1 .. n ], in sorted order, and A\n\ would contain the remaining element, 
which must be the largest in A[1 .. n\. Therefore, A[1.. n \ would be sorted. 
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In the original pseudocode, the last iteration of the outer for loop results in no 
iterations of the inner for loop of lines 1-4. With the upper bound for i set to 
length[A] — 1, the last iteration of outer loop would result in one iteration of the 
inner loop. Either bound, length [ 41 or length[A ] — 1, yields a correct algorithm. 

d. The running time depends on the number of iterations of the for loop of 
lines 2-4. For a given value of i, this loop makes n — i iterations, and i takes 
on the values 1.2,..., n. The total number of iterations, therefore, is 

n n 

(n — i) = n — i 
i=l i=l ;=i 

2 n(n + 1) 
n 2 

2 n 2 n 
" ~ ~2 ~ 2 
n 2 n 
~ 2 ~ 2 ' 

Thus, the running time of bubblesort is 0(« 2 ) in all cases. The worst-case 
running time is the same as that of insertion sort. 



Solution to Problem 2-4 

a. The inversions are (1, 5), (2, 5), (3, 4), (3, 5), (4, 5). (Remember that inver¬ 
sions are specified by indices rather than by the values in the array.) 

h. The array with elements from {1,2,...,/?} with the most inversions is (n, 
n — 1,/z — 2,... ,2, 1). For all 1 < i < j < /?, there is an inversion (/, j). The 
number of such inversions is (") = n(n — l)/2. 

c. Suppose that the array A starts out with an inversion (k, j). Then k < j and 
A[k] > A[j ]. At the time that the outer for loop of lines 1-8 sets key <- A\j\, 
the value that started in A[k ] is still somewhere to the left of A[j], That is, 
it’s in A[i], where 1 < i < j, and so the inversion has become (/, /). Some 
iteration of the while loop of lines 5-7 moves A[7] one position to the right. 
Fine 8 will eventually drop key to the left of this element, thus eliminating the 
inversion. Because line 5 moves only elements that are less than key, it moves 
only elements that correspond to inversions. In other words, each iteration of 
the while loop of lines 5-7 corresponds to the elimination of one inversion. 

d. We follow the hint and modify merge sort to count the number of inversions in 
(-)(/? Ig n) time. 

To start, let us define a merge-inversion as a situation within the execution of 
merge sort in which the Merge procedure, after copying A[p . ,q\ to L and 
A[q + 1 .. r] to R, has values x in L and y in R such that x > y. Consider 
an inversion (7, j), and let x = A[7] and y = A\j\, so that i < j and x > y. 
We claim that if we were to run merge sort, there would be exactly one merge- 
inversion involving v and y. To see why, observe that the only way in which ar¬ 
ray elements change their positions is within the Merge procedure. Moreover, 
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since Merge keeps elements within L in the same relative order to each other, 
and correspondingly for R, the only way in which two elements can change 
then - ordering relative to each other is for the greater one to appeal - in L and the 
lesser one to appear in R. Thus, there is at least one merge-inversion involving 
v and y. To see that there is exactly one such merge-inversion, observe that 
after any call of Merge that involves both x and y, they are in the same sorted 
subarray and will therefore both appeal - in L or both appear in R in any given 
call thereafter. Thus, we have proven the claim. 

We have shown that every inversion implies one merge-inversion. In fact, the 
correspondence between inversions and merge-inversions is one-to-one. Sup¬ 
pose we have a merge-inversion involving values x and y, where x originally 
was A\i | and y was originally A\j\. Since we have a merge-inversion, x > y. 
And since v is in L and y is in R, x must be within a subarray preceding the 
subarray containing y. Therefore x started out in a position i preceding y’s 
original position j, and so (/, j) is an inversion. 

Having shown a one-to-one correspondence between inversions and merge- 
inversions, it suffices for us to count merge-inversions. 

Consider a merge-inversion involving y in R. Let z be the smallest value in L 
that is greater than y. At some point during the merging process, z and y will 
be the “exposed” values in L and R, i.e., we will have z = L[i] and y = /?[_/] 
in line 13 of Merge. At that time, there will be merge-inversions involving y 
and L[i], L[i + 1], L[i + 2], ..., L[n\\, and these n\ — i + 1 merge-inversions 
will be the only ones involving y. Therefore, we need to detect the first time 
that z and y become exposed during the Merge procedure and add the value 
of ri\ — i + 1 at that time to our total count of merge-inversions. 

The following pseudocode, modeled on merge sort, works as we have just de¬ 
scribed. It also sorts the array A. 

Count-Inversions (A, p, r ) 
inversions 0 
if p < r 

then q \ (p + r)/2J 

inversions <— inversions +COUNT-INVERSIONS (A, p, q) 
inversions inversions -(-COUNT-INVERSIONS (A, q + 1, r) 
inversions inversions +MERGE-INVERSIONS (A, /;. q, r) 
return inversions 
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Merge-Inversions (A, p,q,r) 

n i q - p + 1 
« 2 r - q 

create arrays L[ 1.. n.j + 1] and i?[l .. n 2 + 1] 
for /' ^— 1 to ti\ 

do L[i] <r- A[p -hi — 1] 
for / <— I to tii 

do R[j] <r- A[q + ;] 

L[n i + 1] oo 
R[tl 2 + 1] oo 
i <- 1 

j <- 1 

inversions 0 
counted <— FALSE 

for k 4— p to r 

do if counted = FALSE and R[j] < L[i\ 

then inversions inversions +n\ — i + 1 
counted TRUE 
if L[i] < R[j] 
then A\k\ L[i] 
i r — / —(— 1 
else A\k\ 4- R[j] 

j j + 1 

counted FALSE 
return inversions 

The initial call is Count-Inversions(A, 1, n). 

In Merge-Inversions, the boolean variable counted indicates whether we 
have counted the merge-inversions involving R\ j\. We count them the first time 
that both R[j] is exposed and a value greater than R\ j ] becomes exposed in the 
L array. We set counted to FALSE upon each time that a new value becomes 
exposed in R. We don’t have to worry about merge-inversions involving the 
sentinel oo in R, since no value in L will be greater than oo. 

Since we have added only a constant amount of additional work to each pro¬ 
cedure call and to each iteration of the last for loop of the merging procedure, 
the total running time of the above pseudocode is the same as for merge sort: 
0(» lg n). 




Lecture Notes for Chapter 3: 
Growth of Functions 


Chapter 3 overview 

• A way to describe behavior of functions in the limit. We’re studying asymptotic 
efficiency. 

• Describe growth of functions. 

• Focus on what’s important by abstracting away low-order terms and constant 
factors. 

• How we indicate running times of algorithms. 

• A way to compare “sizes” of functions: 

O « < 

Q s=s > 

0 « = 
o ~ < 

CO ~ > 


Asymptotic notation 

O -notation 

0(g(n)) = { f(n ) : there exist positive constants c and /; 0 such that 
0 < f(n) < cg(n) for all n > n 0 } . 



gin) is an asymptotic upper bound for fin). 

If fin) € Oigin )), we write fin) = Oig(n)) (will precisely explain this soon). 
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Example: 2nr = 0(n 3 ), with c = 1 and n 0 = 2. 

Examples of functions in 0 (n 2 ): 
n 2 

n 2 + n 
n 2 + 1000» 
lOOOn 2 + 1000/2 
Also, 
n 

n/ 1000 
n 1.99999 

« 2 /lg lg lg n 

S2-notation 

Q(g(n)) = {/(«) : there exist positive constants c and n f] such that 
0 < cg(n) < f(n) for all n > «o} • 



g(n) is an asymptotic lower bound for fin). 

Example: n = ^(lgn), with c = 1 and n 0 = 16. 

Examples of functions in Q(n 2 ): 
n 2 

n 2 + n 

9 

n — n 

1000» 2 + 1000/7 

1000» 2 - I ()()()// 

Also, 

n 3 

?; 2.00001 

n 2 lg lg lg n 

2 2 " 

0-notation 

= {fin) : there exist positive constants q, ci, and n 0 such that 
0 < c\g{n) < fin) < C 2 g(n) for all n > n 0 ) . 
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gin) is an asymptotically tight bound for fin). 

Example: n 2 / 2 — 2 n = 0(« 2 ), with c\ = 1/4, ci = 1/2, and n 0 = 8. 

Theorem 

fin ) = ©(§(«)) if and only if / = 0(g(«)) and / = £2(g(n)) . 
Leading constants and low-order terms don’t matter. 


Asymptotic notation in equations 

When on right-hand side: 0(n 2 ) stands for some anonymous function in the set 
0{n 2 ). 

2n 2 +3n+l = 2n 2 +@(n) means 2w 2 +3« + l = 2 n 2 -\-fin) for some fin) G ©in). 
In particular, /(« ) = 3/2 + 1. 

By the way, we interpret # of anonymous functions as = # of times the asymptotic 
notation appeal's: 

n 

y Oii) OK: 1 anonymous function 

i=i 

0(1) + 0(2) + • • • + Oin) not OK: n hidden constants 

=^> no clean interpretation 

When on left-hand side: No matter how the anonymous functions are chosen on 
the left-hand side, there is a way to choose the anonymous functions on the right- 
hand side to make the equation valid. 

Interpret 2 if + 0(/i) = 0(« 2 ) as meaning/or all functions fin ) e 0(/i), there 
exists a function gin ) e 0(« 2 ) such that 2 n 2 + fin) = gin). 

Can chain together: 

2 n 2 + 3 n + 1 = 2 n 2 + 0(«) 

= ©in 2 ) . 

Interpretation: 

• First equation: There exists fin) e ©in) such that 2 if + 3 n + 1 = 2if + fin). 

• Second equation: For all gin) G ©in) (such as the fin) used to make the first 
equation hold), there exists hin) g ©in 2 ) such that 2if + gin) = hin). 
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o-notation 

o(gfi)) = {ffi) : for all constants c > 0, there exists a constant 

/?o > 0 such that 0 < J'(n) < cgfi ) for all n > «o} • 

Another view, probably easier to use: lim — - = 0. 

( 2 ->oo g(n) 

if" 99 = o(n 2 ) 
n 1 / lgn = o(n 2 ) 
n 2 ^ off) (just like 2 / 2) 
n 2 /1000 ^ off) 


w-notation 


&j(g(/i)) = {/(«) : for all constants c > 0, there exists a constant 

«o > 0 such that 0 < cg(n) < ffi) for all n > Ho) • 


Another view, again, probably easier to use: lim 

n—* oo 


fin ) 

g(«) 


= oo. 


n 2 - 0001 = <y(« 2 ) 

ft 2 lgft = 0)(ft 2 ) 
ft 2 CO ft 2 ) 


Comparisons of functions 

Relational properties: 

Transitivity: 

fin) = @ig(n)) and g(n) = ®(h(n)) =» /(ft) = ®(h(n)). 

Same for O, £2, o, and w. 

Reflexivity: 

/(«) = ©(/ (ft)). 

Same for O and £2. 

Symmetry: 

fin) = ®igin)) if and only if g(n) = ©(/(«)). 

Transpose symmetry: 

fin) = Oigfi)) if and only if gin) = £2 (/(«)). 
fin) = oigin)) if and only if gin) = coif in)). 

Comparisons: 

• /(«) is asymptotically smaller than gfi) if /(«) = oigin)). 

• /(«) is asymptotically larger than gin) if ffi) = co(gfi)). 

No trichotomy. Although intuitively, we can liken O to <, £2 to >, etc., unlike 
real numbers, where a < b, a = b, or a > b, we might not be able to compare 
functions. 

Example: and n, since 1 + sin ft oscillates between 0 and 2. 
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Standard notations and common functions 

[You probably do not want to use lecture time going over all the definitions and 
properties given in Section 3.2, but it might be worth spending a few minutes of 
lecture time on some of the following.] 

Monotonicity 

• / (n ) is monotonically increasing if m < n =>• f(m) < f (n). 

• f(n) is monotonically decreasing if m > n => f(m ) > fin). 

• f{n) is strictly increasing if m < n =$ f(m) < fin). 

• fin) is strictly decreasing if m > n => f(m ) > fin). 

Exponentials 

Useful identities: 

« _1 = 

(a m ) n = 
a m a n = 

Can relate rates of growth of polynomials and exponentials: for all real constants 
a and b such that a > 1, 

n h 

lim — = 0 , 

??—*-oo a n 

which implies that n h = o(a"). 

A suprisingly useful inequality: for all real x, 

e x > 1 + x . 

As x gets closer to 0, e x gets closer to 1 + x. 

Logarithms 


Notations: 


lg n 

= log 2 n 

(binary logarithm) 

In n 

W) 

O 

II 

(natural logarithm) 

lg^ n 

= iignf 

(exponentiation) , 

lg lg n 

= lg(lg«) 

(composition) . 


Logarithm functions apply only to the next term in the formula, so that lg n + k 
means (lg n) + k, and not lg(« + k). 

In the expression log /; a: 

• If we hold b constant, then the expression is strictly increasing as a increases. 


1 /a , 


~m+n 
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• If we hold a constant, then the expression is strictly decreasing as b increases. 


Useful identities for all real a > 0, b > 0, c > 0, and n, and where logarithm bases 
are not 1: 

a = b log ” a , 

log c (ab) = log f a H- log f b , 
log* a" = n log* a , 

log c a 


log* a 
log*(l /a) 


log c b ’ 

- log/, a , 


log*« 

a l °Zh c 


1 

'og„ b ’ 

c l0&fl . 


Changing the base of a logarithm from one constant to another only changes the 
value by a constant factor, so we usually don’t worry about logarithm bases in 
asymptotic notation. Convention is to use lg within asymptotic notation, unless the 
base actually matters. 

Just as polynomials grow more slowly than exponentials, logarithms grow more 

n h 

slowly than polynomials. In lim — =0, substitute lg n for n and T for a: 

,?^oo a" 


lim 

>oo 


lg fo » 

(2«)lgH 



n —>oo n a 


= o, 


implying that lg* n = o(n a ). 


Factorials 

nl — 1-2-3 ■ n. Special case: 0! = 1. 
Can use Stirling’s approximation, 

„! = V2^0"(i + e(I)) , 

to derive that lg(« !) = 0(n lg n). 
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Solution to Exercise 3.1-1 


First, let’s clarify what the function max(/(n), g(n)) is. Let’s define the function 
h(n) = max(/(n), g(n)). Then 


h(n ) = 


\ /(») 
\g(n) 


if fin) > gin) , 
if f{n) < gin) . 


Since fin) and gin) are asymptotically nonnegative, there exists i\, such that 
fin) > 0 and gin) > 0 for all n > n^. Thus for n > n 0 , fin) + gin) > fin) > 0 
and f jn)+gjn)_> gin) > 0. Since for any particular n, h in) is either fin) or gin), 
we have fin) + gin) > \in\ 1 0, which shows that h)n) = max(/(n), gin)) < 
ciifin) + gin)) for all n > « 0 (with C 2 = 1 in the definition of 0). 


Similarly, since for any particular n,h in) is the larger of fin) and gin), we have for 
all n > no,0 < fin) < hjn) and 0 < gin) < h in). Adding these two inequalities 
yields 0 < fin) + gin) < 2 / 7 ( 7 ?), or equivalently 0 < if in) + gin))/2 < hin), 
which shows that hin) = max(/(??), gin)) > C] if in) + gin)) for all n > no (with 
ci = 1/2 in the definition of 0). 


Solution to Exercise 3.1-2 


To show that in + a) h = ®in b ), we want to find constants c\, eg, n 0 > 0 such that 
0 < c\n h < in + a) b < C 2 n b for all n > n 0 . 


Note that 



77 + a 

< 

7? + |n| 


and 

< 

2n 

when |a| < 77 , 

77 + a 

> 

7? — |tz| 

1 

when |«| < ^ 7 ? 


> 

—77 



2 

1 1 — 2 


Thus, when n > 2 \a\, 
1 
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Since b > 0, the inequality still holds when all parts are raised to the power b: 
0 < n'j < (n + a) b < (2n) b , 

0 < n h <(n + a) b < 2 h n h . 

Thus, ci = (l/2) b , C 2 = 2 b , and n 0 = 2 |a| satisfy the definition. 


Solution to Exercise 3.1-3 

Let the running time be T(n). T(n) > 0(n 2 ) means that T(n) > f (n) for some 
function f(n) in the set 0(n 2 ). This statement holds for any running time T(n), 
since the function g{n) = 0 for all n is in 0(n 2 ), and running times are always 
nonnegative. Thus, the statement tells us nothing about the running time. 


Solution to Exercise 3.1-4 


2"+‘ = 0(2"), but 2 2 " / 0(2"). 

To show that 2" +1 = 0(2"), we must find constants c, no > 0 such that 
0 < 2" +1 < c • 2" for all n > n 0 . 

Since 2" +1 =2-2" for all n, we can satisfy the definition with c = 2 and no = I. 
To show that 2 2 " 0(2"), assume there exist constants c, no > 0 such that 

0 < 2 2n <c- 2" for all n > n 0 . 

Then 2 2 " = 2" ■ 2" < c • 2" 2" < c. But no constant is greater than all T, and 

so the assumption leads to a contradiction. 


Solution to Exercise 3.1-8 

£2(g(n, m)) = { f(n, m) : there exist positive constants c, no, and m 0 

such that 0 < cg(n, m) < f(n, m) 
for all n > n 0 and m > m 0 } . 

&(g(n, m )) = { /' (n , m) : there exist positive constants ci, C 2 , n 0 , and m 0 

such that 0 < c\g(n, m) < f(n, m) < C 2 g(n, m) 
for all n > n 0 and m > m 0 } . 
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Solution to Exercise 3.2-4 

fig «]! is not polynomially bounded, but fig lg if ]! is. 

Proving that a function f(n) is polynomially bounded is equivalent to proving that 
lg(/(«)) = Odgn) for the following reasons. 

• If / is polynomially bounded, then there exist constants c, k, i\ t such that for 
all n > n 0 , f(n) < cn k . Hence, lg(/(«)) < kclgn, which, since c and k are 
constants, means that lg(/(«)) = 0(\g n). 

• Similarly, if lg(/(«)) = 0(\g n), then / is polynomially bounded. 

In the following proofs, we will make use of the following two facts: 

1. lg(« !) = @(n lg n) (by equation (3.18)). 

2. rignl = 0(lg«), because 

• |lgnl >lgn 

• [lg n~\ < lg n + I < 2 lg n for all n > 2 

lg(Rg n \!) = ©(flgnl lg flgnl) 

= 0(lg n lg lg n) 

= a)(lgn) . 

Therefore, lg([lgn]!) 0(lgn), and so [lg if ! is not polynomially bounded. 


lg(n g lg n \!) 


©(flglgnl lg Hglgnl) 

0 (lg lg n lg lg lg n) 
o((lg lg n) 2 ) 
o(\g 2 (\gn)) 
o(\gn) . 


The last step above follows from the property that any polylogarithmic function 
grows more slowly than any positive polynomial function, i.e., that for constants 
n, (? > 0, we have lg* n = o(n“). Substitute lg n for n, 2 for b, and 1 for a, giving 
lg 2 (lg w) = o(lg n). 

Therefore, lg(fig lg n ~\!) = 0(lg n), and so fig lg n ~\! is polynomially bounded. 


Solution to Problem 3-3 

a. Here is the ordering, where functions on the same line are in the same equiva¬ 
lence class, and those higher on the page are £2 of those below them: 
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2 2 " +1 

2 2 " 


(n + 1)! 


n ! 

see justification 7 

e n 

see justification 1 

n ■ 2" 


2" 


(3/2)" 


(lg«) lgn = n lglg " 

see identity 1 

(lg n)\ 

„3 

see justifications 2, 8 

n 2 = 4 lg " 

see identity 2 

nlgn and lg(«!) 

see justification 6 

n = 2 lg " 

see identity 3 

(V2) lg "(= yfn) 

see identity 6, justification 3 

2 V 21 ®" 

see identity 5, justification 4 

lg 2 n 


In n 


Tig ~n 


In In n 

see justification 5 

2 lg *" 


lg*« and lg*(lgn) 

see identity 7 

lgGg*)n 


n l/lg "(= 2) and 1 

see identity 4 


Much of the ranking is based on the following properties: 

• Exponential functions grow faster than polynomial functions, which grow 
faster than polylogarithmic functions. 

• The base of a logarithm doesn’t matter asymptotically, but the base of an 
exponential and the degree of a polynomial do matter. 

We have the following identities: 

1. (lg n) Xgn = n lglg " because a logbC = c Xogba . 

2. 4 lg " = n 2 because a logbC = c logbCl . 

3. 2 lgn = n. 

4. 2 = n 1 / 1 ®" by raising identity 3 to the power 1/lg n. 

5 2'/ llgn = /; V 2 / 'S" by raising identity 4 to the power ^2 lg n. 

6. (V2) 18 " = y/n because (V2)' § " = 2 (1 / 2)lg " = 2 lg ^ = y^. 

7. lg*(lg n) = (lg* n) — 1. 

The following justifications explain some of the rankings: 

1. e n = 2 n {e/2) n = co(n2 n ), since (e/2)" = co(n). 

2. (ig n)l = co(n 3 ) by taking logs: lg(lg n )! = (-)(lg n lg lg n) by Stirling’s 
approximation, lg(n 3 ) = 3 Ig n. Ig Ig n = u>( 3). 
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3. (V2) ]gn = <w(2 V 2 ^) by taking logs: lglVT) 1 ®" = (1/2) lg lg2V^ = 
y/2\gn. (1/2)lg n = co(^/2lgn). 

4 2^ 21g " = &>(lg 2 n) by taking logs: lg 2^ 1Xgn = y/2\g n, lg lg 2 n = 2 lg lg n. 
s/2lg n = &>(21glg n). 

5. In Inn = a>( 2 lg *") by taking logs: lg 2 lg *" = lg* n. lg In In n = <y(lg*n). 

6. lg(n!) = 0(» lgn) (equation (3.18)). 

7. n\ = (~)(n" +l/2 e^") by dropping constants and low-order terms in equa¬ 
tion (3.17). 

8. (lgn)! = 0((lgn) lg " +1 / 2 e _lg ") by substituting lg n for n in the previous 
justification. (lgn)! = 0((lg n) lg,,+1/2 n _lge ) because a Xogb c = c loghCl . 

b. The following f(n ) is nonnegative, and for all functions g(n) in paid (a), fin) 

is neither 0(gi(n )) nor Q(gfn)). 

\ 2 2 " + ~ if n is even , 

/W= 0 if „ is odd. 
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Chapter 4 overview 

A recurrence is a function is defined in terms of 

• one or more base cases, and 

• itself, with smaller arguments. 

Examples: 

• T(n) = { 1 if n = 1 , 

} [r(n-l) + l if n > l . 

Solution: T(n) = n. 

T( , 11 if n - 1 , 

(n) yiT(n/2) + n if n > 1 . 

Solution: T (, n ) = n lg n + n. 

• T(n) — |° if " = 2 ’ 

y > [r(v^) + 1 if n > 2 . 

Solution: T(n) = lg lg n. 

• J(«) — { 1 if« = 1 , 

\T(n/ 3) + T(2n/3) + n if n > 1 . 

Solution: T(n) = @(nlgn). 

[The notes for this chapter are fairly brief because we teach recurrences in much 
greater detail in a separate discrete math course.] 

Many technical issues: 

• Floors and ceilings 

[Floors and ceilings can easily be removed and don't affect the solution to the 
recurrence. They are better left to a discrete math course.] 

• Exact vs. asymptotic functions 

• Boundary conditions 

In algorithm analysis, we usually express both the recurrence and its solution using 
asymptotic notation. 
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• Example: Tin) = 2T ( n/2 ) + @(«), with solution T(n) = (-)(/? lg n ). 

• The boundary conditions are usually expressed as “T (n) = 0(1) for suffi¬ 
ciently small n.” 

• When we desire an exact, rather than an asymptotic, solution, we need to deal 
with boundary conditions. 

• In practice, we just use asymptotics most of the time, and we ignore boundary 
conditions. 

[In my course, there are only two acceptable ways of solving recurrences: the 
substitution method and the master method. Unless the recursion tree is carefully 
accounted for, I do not accept it as a proof of a solution, though I certainly accept 
a recursion tree as a way to generate a guess for substitution method. You may 
choose to allow recursion trees as proofs in your course, in which case some of the 
substitution proofs in the solutions for this chapter become recursion trees. 

I also never use the iteration method, which had appeared in the first edition of 
Introduction to Algorithms. I find that it is too easy to make an error in paren- 
thesization, and that recursion trees give a better intuitive idea than iterating the 
recurrence of how the recurrence progresses.] 


Substitution method 


1. Guess the solution. 

2. Use induction to find the constants and show that the solution works. 


Example: 

r , I 1 if n = 1 , 

W \2T(n/2) + n if n > 1 . 


1 . 


2 . 


Guess: T(n ) = n lg n + n. [Here, we have a recurrence with an exact func¬ 
tion, rather than asymptotic notation, and the solution is also exact rather than 
asymptotic. We ’ll have to check boundary conditions and the base case.] 
Induction: 


Basis: n — 1 => n lg n + n — 1 = T(n) 


Inductive step: Inductive hypothesis is that T(k) = klgk + k for all k < n. 
We’ll use this inductive hypothesis for T(n/2). 


T(n ) 



(by inductive hypothesis) 


= nig - + n + n 
= n (lg n — lg 2) + n + n 
= n lg n — n + n + n 

= n lg n + n . m 
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Generally, we use asymptotic notation: 

• We would write T(n) = 2T ( n/2 ) + (~)(n). 

• We assume T (n) = 0(1) for sufficiently small n. 

• We express the solution by asymptotic notation: Tin) = 0(n Ig n). 

• We don’t worry about boundary cases, nor do we show base cases in the substi¬ 
tution proof. 

• T(n) is always constant for any constant n. 

• Since we are ultimately interested in an asymptotic solution to a recurrence, 
it will always be possible to choose base cases that work. 

• When we want an asymptotic solution to a recurrence, we don’t worry about 
the base cases in our proofs. 

• When we want an exact solution, then we have to deal with base cases. 

For the substitution method: 

• Name the constant in the additive term. 

• Show the upper (O) and lower (f2) bounds separately. Might need to use dif¬ 
ferent constants for each. 

Example: T (n) = 2 T (n/2)+©(«). If we want to show an upper bound of T (n) = 
2T(n/2) + 0(n), we write T (. n ) < 2 T(n/2) + cn for some positive constant c. 

1. Upper bound: 

Guess: T(n ) < dnlgn for some positive constant d. We are given c in the 
recurrence, and we get to choose d as any positive constant. It’s OK for d to 
depend on c. 

Substitution: 

T(n ) < 2T(n/2) + cn 

„ / ,n , n \ 

= 2 (rf- Ig -) + cn 

n 

= dn lg —b cn 
2 

= dn lg n — dn + cn 
< dn lg n if — dn + cn < 0 , 

d > c 

Therefore, T(n) = Oin Ig n). 

2. Lower bound: Write T(n) > 2 T(n/2) + cn for some positive constant c. 
Guess: T (n) > dn lg n for some positive constant d. 

Substitution: 

T(n) > 2T(n/2) + cn 
/ n n \ 

= 2 ( rf 2 ‘ S 2 ) + C " 

n 

= dn lg —b cn 
2 

= dnlgn — dn + cn 
> dn lg n if — dn + cn > 0 , 

d < c 
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Therefore, T(n) = Q(n Ig n). 

Therefore, T (n) = Q(n \gn). [For this particular recurrence, we can used = c for 
both the upper-bound and lower-bound proofs. That won f always be the case.] m 

Make sure you show the same exact form when doing a substitution proof. 
Consider the recurrence 

T(n) = ST(n/2) + 0(« 2 ) . 

For an upper bound: 

Tin) < ST {n/2) + cn 2 . 

Guess: T (n) < dn 3 . 

T(n) < Sd(n/2) 3 + cn 2 
= Sd(n 3 /S) + cn 2 
= dn 3 + cn 2 

^ dn doesn’t work! 

Remedy: Subtract off a lower-order term. 

Guess: T(n) < dn 3 — d'n 2 . 

T (n) < S(d(n/2) 3 - d'(n/2) 2 ) + cn 2 

= 8 d(n 3 /S)-Sd'(n 2 /4)+cn 2 
= dn 3 — 2d' n 2 + cn 2 
= dn 3 — d'n 2 — d'n 2 + cn 2 

< dn 3 — d'n 2 if —d'n 2 + cn 2 < 0 , 

d' > c 

Be careful when using asymptotic notation. 

The false proof for the recurrence T (n) = AT (n/4) + n, that T (n) = 0(n ): 

T(n) < 4 (c (n/A )) + n 

< cn + n 

= O(n) wrong! 

Because we haven’t proven the exact form of our inductive hypothesis (which is 
that T (n) < cn), this proof is false. 


Recursion trees 

Use to generate a guess. Then verify by substitution method. 

Example: T(n) = T(n/3)+T(2n/3)+&(n). For upper bound, rewrite as T(n) < 
T(n/3 ) + T(2n/3) + cn\ for lower bound, as Tin) > T(n/3) + T(2n/3) + cn. 

By summing across each level, the recursion tree shows the cost at each level of 
recursion (minus the costs of recursive calls, which appeal - in subtrees): 
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cn 



c(nt 3) 


c(2n/3 ) .in- cn 


c{nl 9) c(2n/9) c(2n/9) c(4ri/9) 


cn 


leftmost branch peters rightmost branch peters 

out after log 3 n levels out after log 3/2 n levels 


• There are log 3 n full levels, and after log 3/2 n levels, the problem size is down 
to 1. 

• Each level contributes < cn. 

• Lower bound guess: > cln log 3 n = £1 (n lg n) for some positive constant d. 

• Upper bound guess: < dn log 3/ , 2 n = 0{n lg n) for some positive constant d. 

• Then prove by substitution. 


1. Upper bound: 

Guess: T(n) < dn lg n. 

Substitution: 

T 0?) < T (n/3) + T (2n/3) + cn 

< d(n/3) lg(n/3) + d(2n/3) lg(2n/3) + cn 
= {d{n/ 3) lg n — d(n/3) lg 3) 

+ (d(2n/3) lgn — d(2n/3) lg(3/2)) + cn 
= dn lg n — d((n/3) lg 3 + (2 n /3) lg(3/2)) + cn 
= dn lg n — d((n/ 3) lg 3 + (2n/3) lg 3 — (2n/3) lg 2) + cn 
= dn lg n — dn (lg 3 — 2/3) + cn 

< dnlgn if — dn(lg3 — 2/3) + cn < 0, 

d > ---. 

- lg 3 — 2/3 

Therefore, T(n) = 0(n\gn). 

Note: Make sure that the symbolic constants used in the recurrence (e.g., c) and 
the guess (e.g., d) are different. 

2. Lower bound: 


Guess: T{n) > dnlgn. 

Substitution: Same as for the upper bound, but replacing < by >. End up 
needing 


0 < d < 


lg 3 — 2/3 
Therefore, T(n) = U (n lg n). 


Since T(n) = 0(n lg n) and Tin) = Q(n lg n), we conclude that T(n) = 
0(« lg n). m 
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Master method 

Used for many divide-and-conquer recurrences of the form 

T (, n ) = aT(n/b) + fin) , 

where a > 1, b > 1, and fin) > 0. 

Based on the master theorem (Theorem 4.1). 

Compare n ]ogh a vs. f(n): 

Case 1: fin) = Oin logba ~ € ) for some constant e > 0. 
if in) is polynomially smaller than n ]ogh “.) 

Solution: Tin) = ©(/r log * a ). 

(Intuitively: cost is dominated by leaves.) 

Case 2: fin) = @ (n ]ogt > a lg A ' n), where k > 0. 

[This formulation of Case 2 is more general than in Theorem 4.1, and it is given 
in Exercise 4.4-2.] 

if in) is within a polylog factor of n ]ogb “ , but not smaller.) 

Solution: Tin) = &in logba \g k+l n). 

(Intuitively: cost is n logba lg A n at each level, and there are 0(lg«) levels.) 
Simple case: k = 0 fin) = Q(n ]ogb “) T in) = @(n ]ogba \gn). 

Case 3: fin) = Q(n u>gha+( ) for some constant c > 0 and fin) satisfies the regu¬ 
larity condition afin/b) < cf in) for some constant c < 1 and all sufficiently 
large n . 

if in) is polynomially greater than n ]ogb “ .) 

Solution: Tin) = © if in)). 

(Intuitively: cost is dominated by root.) 

What’s with the Case 3 regularity condition? 

• Generally not a problem. 

• It always holds whenever fin) = n k and fin) = f2(n ]ogb “ +f ) for constant 
e > 0. [Proving this makes a nice homework exercise. See below.] So you 
don’t need to check it when fin) is a polynomial. 

[Here’s a proof that the regularity condition holds when fin) = rt and fin) = 
G(/i log/ >" +€ ) for constant e > 0. 

Since fin) = £lin logba+€ ) and fin) = n k , we have that k > log h a. Using a 
base of b and treating both sides as exponents, we have if > b ]ogh “ = a, and so 
a/b k < 1. Since a, b, and k are constants, if we let c = a/if, then c is a constant 
strictly less than 1. We have that afin/b) = ain/bf = ia/b k )n k = cf in), and so 
the regularity condition is satisfied.] 

Examples: 

• Tin) = 5Tin/2) + @in 2 ) 
n iogl5 vs. n 2 

Since log, 5 — e = 2 for some constant e > 0, use Case 1 =y Tin) = 0(« lg5 ) 
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• T (n) = 21T (n/3) + 0 (« 3 lg n) 

„log 3 27 _ ?; 3 ys _ ,^\g n 

Use Case 2 with k = 1 =>• T (n) = 0(n 3 lg 2 n) 

• T(n) = 5T(n/2) + 0(« 3 ) 

H log 2 5 VS. « 3 

Now lg 5 + e = 3 for some constant e > 0 

Check regularity condition (don’t really need to since f(n) is a polynomial): 
af(n/b ) = 5(«/2) 3 = 5« 3 /8 < cn 3 for c = 5/8 < 1 
Use Case 3 =>• T(n) = 0(« 3 ) 

• T(n) = 21T {n/3) + 0(« 3 /lg«) 

,jlog 3 27 _ n 3 vs _ ^3/ Jg n _ ?? 3 Jg-1 -j. @( w 3 |g* f or an y ^ > 0. 

Cannot use the master method. 

[We don’t prove the master theorem in our algorithms course. We sometimes prove 
a simplified version for recurrences of the form T in) = aT (n/b) + if. Section 4.4 
of the text has the full proof of the master theorem.] 



Solutions for Chapter 4: 
Recurrences 


Solution to Exercise 4.2-2 

The shortest path from the root to a leaf in the recursion tree is n — (I /3)n —>■ 
(l/3) 2 n —> ■■■ —> 1. Since (1/3 ) k n = 1 when k = log 3 n, the height of the 
part of the tree in which every node has two children is log, n. Since the values at 
each of these levels of the tree add up to n, the solution to the recurrence is at least 
n log 3 n = £l(n lg n). 


Solution to Exercise 4.2-5 

T (n) = T (an) + T((l — a) n) + n 

We saw the solution to the recurrence T in) = T(n/3) + T (2«/3) + cn in the text. 
This recurrence can be similarly solved. 

Without loss of generality, let a > 1—a, so that 0<1— a < 1/2 and 1/2 < a < 1. 



The recursion tree is full for log, n levels, each contributing cn, so we guess 
£l(n log 1/(1 _ a) n) = Q(nlgn). It has log 1/a n levels, each contributing < cn, so 
we guess 0(n log ]/Q , n) = 0(n lg n). 
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Now we show that T (n) = @(n lg n) by substitution. To prove the upper bound, 
we need to show that T (n) < dn lg n for a suitable constant d > 0. 

T (n) = T (an) + T((\ — a)n) + cn 

< dan lg (an) + d( 1 — a) n lg ((1 — a)n) + cn 

= dan lga + dan lg n + d(\ — a)n lg(l — a) -\- d(l — a)n lg n + cn 
= dn lg n + dn(a lga + (1 — a) lg(l — a)) + cn 

< dn lg n , 

if dn (a lg a + (1 — a) lg(l — a)) + cn < 0. This condition is equivalent to 
d(alga + (1 — a)lg(l — a)) < —c . 


Since 1/2 < a < 1 and 0 < 1 — a < 1/2, we have that lg a < 0 and lg(l — a) < 0. 
Thus, a lg a + (1 — a) lg(l — a) < 0, so that when we multiply both sides of the 
inequality by this factor, we need to reverse the inequality: 


a lga + (1 — a) lg(l — o') 
or 

d > ---. 

—a lga H—(1 — a) lg(l — a) 

The fraction on the right-hand side is a positive constant, and so it suffices to pick 
any value of cl that is greater than or equal to this fraction. 

To prove the lower bound, we need to show that T (n) > dn lg n for a suitable 
constant d > 0. We can use the same proof as for the upper bound, substituting > 
for <, and we get the requirement that 


—a lga — (1 — a) lg(l — a) 
Therefore, T(n) = (-)(/? lg/?)- 


Solution to Problem 4-1 

Note: In parts (a), (b), and (d) below, we are applying case 3 of the master theorem, 
which requires the regularity condition that af(n/b) < cf(n) for some constant 
c < 1. In each of these parts, fin) has the form n k . The regularity condition is 
satisfied because af(n/b) = an k /b k = (a/b k )n k = (a/b k )f(n), and in each of 
the cases below, a/b k is a constant strictly less than 1. 

a. T(n) = 2T(n/2) + n 3 = @(« 3 ). This is a divide-and-conquer recurrence with 
a = 2, b = 2, f(n) = n 3 , and n'°^ a = n l °^ 2 = n. Since n 3 = Q(n'°^ 2+2 ) 
and a / b k = 2/2 3 = 1/4 < 1, case 3 of the master theorem applies, and 
T(n) = 0(n 3 ). 

b. T (n) = T (9/t/IO) + n = Q(n). This is a divide-and-conquer recurrence with 
a = 1, b = 10/9, f(n) = n, and n ]ogha = n Xog 10 / sl = n° = 1. Since n = 
(2 (n log|() / 91+1 ) and a/b k = I /(10/9 ) 1 =9/10 < 1, case 3 of the master theorem 
applies, and T (n) = 0(n). 
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c. T(n) = 167’ (n/4) + n 2 = @{n 2 lgn). This is another divide-and-conquer 
recuixence with a = 16, b = 4, f in) = n 2 , and n ]ogb a = « log416 = n 2 . Since 
n 2 = @(« log416 ), case 2 of the master theorem applies, and T (n) = 0 (n 2 lg n). 

d. T(n) = TT(n/3 ) + n 2 = 0(w 2 ). This is a divide-and-conquer recurrence with 
a = 7, b = 3, f(n) = n 2 , and n Xogb ° = n log3? . Since 1 < log 3 7 < 2, we have 
that n 2 = £2(H log 3 7+6 ) for some constant e > 0. We also have a /// = 7/3 2 = 
7/9 < 1, so that case 3 of the master theorem applies, and T (n) = 0(/r). 

e. T(n) = lT(n/2) + n 2 = 0(n lgl ). This is a divide-and-conquer recurrence 
with a = 7, b = 2, f(n) = n 2 , and n 10 ®*" = n log 2 7 . Since 2 < lg7 < 3, we 
have that rr = 0(« log27-6 ) for some constant e > 0. Thus, case 1 of the master 
theorem applies, and T(n) = 0(n lg7 ). 

/. T(n) = 2T(n/A) + -Jn = (-)(^/n\gn). This is another divide-and-conquer 
recuri'ence with a = 2, b = 4, f(n) = *Jn, and n logha = n log42 = *Jn. 
Since -Jn = 0(n log42 ), case 2 of the master theorem applies, and T (n) = 
0(V« lg n). 

g. T(n) — T(n — 1) + n 

Using the recursion tree shown below, we get a guess of T (n) = 0(/r). 
n 

n -1 


n-2 


2 


1 

0(n 2 ) 

First, we prove the Tin) = U {rr) pari by induction. The inductive hypothesis 
is T(n) > cn 2 for some constant c > 0. 

T(n) = T(n-l) + n 

> c(n — l) 2 + n 

= cn 2 — 2 cn + c + n 

> cn 2 

if —2 cn + n + c > 0 or, equivalently, n (I — 2c) + c > 0. This condition holds 
when n > 0 and 0 < c < 1/2. 

For the upper bound, T(n) = 0(n 2 ), we use the inductive hypothesis that 
T (n) < cn 2 for some constant c > 0. By a similar - derivation, we get that 
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T in) < cn 2 if —2 cn + n + c < 0 or, equivalently, n (I — 2c) + c < 0. This 
condition holds for c = 1 and n > 1. 

Thus, T(n) = Q in 2 ) and Tin) = 0(n 2 ), so we conclude that Tin) = 0(n 2 ). 
h. T{n) = T (y/ti) + 1 

The easy way to do this is with a change of variables, as on page 66 of 
the text. Let m = lg n and S(m) = T(2 m ). T(2 m ) = T(2 m ' 2 ) + 1, so 
Sim) = Sim/2) + 1. Using the master theorem, n u>Sha = n ]og21 = n° = 1 
and fin) = 1. Since 1 = 0(1), case 2 applies and Sim) = 0(lg m). There¬ 
fore, Tin) = ©(lglgn). 


Solution to Problem 4-4 


[This problem is solved only for parts a, c, e, f, g, h , and i.] 


a. T in) = 3 T in/2) + n lg n 

We have fin) = n lg n and n logb “ = « lg3 ~ , ? i- 585 . Since n lg n = 0(« lg3-e ) 
for any 0 < e < 0.58, by case 1 of the master theorem, we have T in) = 
0 (« lg3 ). 

c. T in) = 4 T in/2) + n 2 +/n 

We have fin) = trt~Jn = tv ’! 2 and n logba = n log24 = n lg2 . Since n 5 ^ 2 = 
U (/j lg 2+3 / 2 ), we look at the regularity condition in case 3 of the master theorem. 
We have afin/b) = Ain/2) 2 ^/n/2 = » 5 / 2 /V2 < cn 5 ^ 2 for 1/V2 < c < 1. 
Case 3 applies, and we have Tin) = &in 2 ^/n). 


e. T in) = 2 T in/2) + n/ lg n 

We can get a guess by means of a recursion tree: 

n 

lg n 


n/ 2 

lg(w/2) 


n/2 

lg(w/2) 


lg« 


n /4 


«/4 m/4 


m /4 


lg (m/ 4) lg Cm/ 4) lg Cm/ 4) lg(M/4) 


lg H 


lg M - 1 


lg M 2 


\gn— 1 


V -—^— 7 = © (m lg lg h ) 
7 ^ !g« 


We get the sum on each level by observing that at depth i, we have 2 nodes, 
each with a numerator of n/2' and a denominator of lg(n/2') = lg n—i, so that 
the cost at depth i is 
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n/2' n 

lg n — i lg n — i 


The sum for all levels is 

Izn—l 1 

t—\ Tl v—\ Yl 

> :-: = n } - 

h ] s n ~ i - 1 


lg n 

= n^l/i 

i=i 

= n ■ 0(lglgn) 
= ©(nlglgn). 


(by equation (A.7), the harmonic series) 


We can use this analysis as a guess that T (,n) = 0(n lg lg n ). If we were to do 
a straight substitution proof, it would be rather involved. Instead, we will show 
by substitution that T ( n) < n(l + tf L i g „j) and T(n) > n ■ H ngn 1; where H k 
is the &th harmonic number: H k = 1/1 + 1/2 + 1/3 + • • • + l/k. We also 
define H 0 = 0. Since H k = 0(lg&), we have that /7 L i g «j = 0(lg Ug»J) = 
0(lglg«) and /7p g „] = 0(lg|"lg«]) = 0(lglg«). Thus, we will have that 
Tin) = Q(n lg lg n). 

The base case for the proof is for n = 1, and we use T(l) = 1. Here, lg n = 0, 
so that lg n = LlgnJ = \)gn\. Since H 0 = 0, we have T[l) = 1 < 1(1 + H 0 ) 
and T(l) = 1 >0 = 1 ■ H 0 . 

For the upper bound of Tin) < n{\ + Hy\ gn j), we have 
T in) = 2 T in/2) + n /lg n 

< 2((n/2)(l + // L i g( „/ 2 )j)) + n/lgn 
= nil + tf L i g „_ij) + n/lgn 

= nil + //pgnj-i + 1/lg n) 

< nil + //pg/ij-i + 1/ Llg n\) 

= «(1 + //[lgnj) , 

where the last line follows from the identity H k = H k -\ + l/k. 

The upper bound of T(n) > n ■ H ^ gn ] is similar: 

T in) = 2 T in/2) + n/ lg n 

> 2((n/2) • H { i g( „/ 2 )i) + n/ lg n 
= n • tfpgn-ri + n/lgn 

= n • (tffi g „-|_i + 1/lgn) 

> n • (//p g „]_i + 1/ |~lg»l) 

= n • ■ 

Thus, Tin) = 0(n lglgn). 

/. T in) = T (n/2) + T in/ 4) + T in/ 8) + n 

Using the recursion tree shown below, we get a guess of T (n) = 0(n). 
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We use the substitution method to prove that T(n) = O(n). Our inductive 
hypothesis is that T (n ) < cn for some constant c > 0. We have 
T{n) = T {n/2) + T (n/4) + T (n/8) + n 


< cn/2 + cn/ 4 + cn /8 + n 
= lcn/8 + n 

= (1 + lc/8)n 

< cn if c > 8 . 

Therefore, Tin) = O(n). 

Showing that T(n) = Q(n) is easy: 

T(n) = T(n/2) + T(n/ 4) + T(n/ 8) + n > n . 

Since T (, n ) = O(n) and T(n) = Q(n), we have that T(n) = 0(«). 

g. T(n) = T(n-l) + l/n 

This recuixence corresponds to the harmonic series, so that T(n) = If,, where 
H n = 1/1 + 1/2 + 1/3 + • • • + 1 /n. For the base case, we have 7+1) = 1 = 77|. 
For the inductive step, we assume that T(n — 1) = H „-\, and we have 
T(n) = T(n-l) + \/n 
= T /,,-1 + l/n 

= H n . 

Since H n = 0(lgn) by equation (A.7), we have that T(n) = 0(lg«). 

h. T (n) — T(n — 1) + lg n 

We guess that Tin) = 0(« lg n). To prove the upper bound, we will show that 
T (n) = 0(n lg n). Our inductive hypothesis is that T in) < cn lg /j for some 
constant c. We have 
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T (n) = Tini — 1) + lg n 

< c(n — 1) lg {n — 1) + lg n 

= cn lg (n — 1) — c lg in — 1) + lg n 

< cn lg (n — 1) — c lg(«/2) + lg n 

(since lg(« — 1) > lg(« /2) for n > 2) 

= cn lg in — 1) — c lg n + c + lg n 

< cn lg w — c lg n + c + lg n 

< cn lg n , 

if — c lg n + c + lg n <0. Equivalently, 

— c lg n + c + lg n < 0 

c < (c — 1) lg n 
lg n > c / (c — 1) . 

This works for c = 2 and all n > 4. 

To prove the lower bound, we will show that Tin) = <2 (n lg n ). Our inductive 
hypothesis is that T(n) > cn lg n + dn for constants c and d. We have 
T (n) = T {n — 1) + lg n 

> c(n — 1) lg(« — 1) + d(n — 1) + lgn 

= cn lg(« — 1) — c lg (n — 1 ) + dn — d + lg n 

> cn lg(«/2) — c lg (n — 1) + dn — d + lg n 

(since lg (n — 1) > lg(«/2) for n > 2) 

= cn lg n — cn — c\g{n — 1) + dn — d + lg n 

> cn lg n , 

if — cn — c lg (n — 1) + dn — d + lg n > 0. Since 
— cn — c lg (n — 1) + dn — d + lg n > 

—cn — c lg (n — 1) + dn — d + lg(« — 1) , 
it suffices to find conditions in which — cn —c lg(« — l)+dn — d+lg(» — 1) > 0. 
Equivalently, 

— cn — c lg (n — 1) + dn — d + lg (n — 1) >0 

{d — c)n > (c — 1) lg(n — 1) + d . 

This works for c — 1, d — 2, and all n > 2. 

Since T(n) = O(nlgn) and T(n ) = O (n lg n), we conclude that T(n) = 
&(n lg n). 

i. T{n) = T{n -2) + 21gn 

We guess that T(n) = (~)(n lg n). We show the upper bound of T(n) = 
0(n lg n) by means of the inductive hypothesis T (n) < cn lg n for some con¬ 
stant c > 0. We have 

T(n) = Tin -2) + 2\gn 

< c(n — 2) lg(n— 2) + 2 lg n 

< c(n — 2) \gn + 21gn 
= {cn — 2c + 2) lg n 
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= cn lg n + (2 — 2c) lg n 
< cn lg n if c > 1 . 

Therefore, T(n) = 0(n lg n). 

For the lower bound of T(n) = Q. (n lg n), we’ll show that T in) > cn lg n+dn, 
for constants c, d > 0 to be chosen. We assume that n > 4, which implies that 

1. lg (n -2) > lg(«/2), 

2. n/2 > lg n, and 

3. n/2 > 2. 

(We’ll use these inequalities as we go along.) We have 
T (n) > c{n — 2) lg (n — 2) + d(n — 2) + 2 lg n 

= cn lg(« — 2) — 2c lg(« — 2) + dn — 2d + 2lg n 

> cn lg (n — 2) — 2c lg n + dn — 2d + 2 lg n 

(since — lg n < — lg(« — 2)) 

= cn lg (n — 2) — 2(c — 1) lg n + dn — 2d 

> cn lg(«/2) — 2(c — 1) lg n + dn — 2d (by inequality (1) above) 

= cn lg n — cn — 2 (c — 1) lg n + dn — 2d 

> cn lg n , 

if — cn— 2(c—1) lg n+dn—2d > 0 or, equivalently, dn > cn+2{c— 1) lg n+2d. 
Pick any constant c > 1/2, and then pick any constant d such that 

d > 2(2c - 1) . 

(The requirement that c > 1/2 means that d is positive.) Then 
d/2 > 2c — 1 = c + (c — 1) , 
and adding d /2 to both sides, we have 
d > c + (c — l) + d/2 . 

Multiplying by n yields 
dn > cn + (c — 1 )n + dn/2 , 

and then both multiplying and dividing the middle term by 2 gives 
dn > cn + 2(c — l)n/2 + dn/2 . 

Using inequalities (2) and (3) above, we get 
dn > cn + 2 (c — 1) lg n + 2d , 

which is what we needed to show. Thus T(n) = Q. (n lg /? ). Since T(n) = 
0(n lg n) and T(n) = £1 (n lg n), we conclude that T(n) = &(n lg n). 
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[This chapter introduces probabilistic analysis and randomized algorithms. It as¬ 
sumes that the student is familiar with the basic probability material in Appendix C. 

The primary goals of these notes are to 

• explain the difference between probabilistic analysis and randomized algo¬ 
rithms. 

• present the technique of indicator random variables, and 

• give another example of the analysis of a randomized algorithm (permuting an 
array in place). 

These notes omit the technique of permuting an array by sorting, and they omit the 
starred Section 5.4.] 


The hiring problem 

Scenario: 

• You are using an employment agency to hire a new office assistant. 

• The agency sends you one candidate each day. 

• You interview the candidate and must immediately decide whether or not to 
hire that person. But if you hire, you must also fire your current office assis¬ 
tant-even if it’s someone you have recently hired. 

• Cost to interview is a per candidate (interview fee paid to agency). 

• Cost to hire is cy, per candidate (includes cost to fire current office assistant + 
hiring fee paid to agency). 

• Assume that cy, > c,. 

• You are committed to having hired, at all times, the best candidate seen so 
fair Meaning that whenever you interview a candidate who is better than your 
current office assistant, you must fire the current office assistant and hire the 
candidate. Since you must have someone hired at all times, you will always 
hire the first candidate that you interview. 


Goal: Determine what the price of this strategy will be. 
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Pseudocode to model this scenario: Assumes that the candidates are numbered 1 
to n and that after interviewing each candidate, we can determine if it’s better than 
the current office assistant. Uses a dummy candidate 0 that is worse than all others, 
so that the first candidate is always hired. 

Hire-Assistant (n) 

best 0 \> candidate 0 is a least-qualified dummy candidate 

for f ^— 1 to n 

do interview candidate i 

if candidate i is better than candidate best 
then best i 

hire candidate i 

Cost: If n candidates, and we hire m of them, the cost is 0(nq + me /,). 

• Have to pay tic, to interview, no matter how many we hire. 

• So we focus on analyzing the hiring cost mq,. 

• mch varies with each run—it depends on the order in which we interview the 
candidates. 

• This is a model of a common paradigm: we need to find the maximum or 
minimum in a sequence by examining each element and maintaining a current 
“winner.” The variable m denotes how many times we change our notion of 
which element is currently winning. 

Worst-case analysis 

In the worst case, we hire all n candidates. 

This happens if each one is better than all who came before. In other words, if the 
candidates appeal - in increasing order of quality. 

If we hire all n, then the cost is O(nc) + nc h ) = 0(nci,) (since c/, > c,). 


Probabilistic analysis 

In general, we have no control over the order in which candidates appear. 

We could assume that they come in a random order: 

• Assign a rank to each candidate: rank(i) is a unique integer in the range 1 to n. 

• The ordered list (rank( 1), rank(2), ..., rank(n)) is a permutation of the candi¬ 
date numbers (1,2,..., n }. 

• The list of ranks is equally likely to be any one of the n ! permutations. 

• Equivalently, the ranks form a uniform random permutation : each of the pos¬ 
sible n ! permutations appeal's with equal probability. 

Essential idea of probabilistic analysis: We must use knowledge of, or make as¬ 
sumptions about, the distribution of inputs. 

• The expectation is over this distribution. 

• This technique requires that we can make a reasonable characterization of the 
input distribution. 
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Randomized algorithms 

We might not know the distribution of inputs, or we might not be able to model it 

computationally. 

Instead, we use randomization within the algorithm in order to impose a distribu¬ 
tion on the inputs. 

For the hiring problem: Change the scenario: 

• The employment agency sends us a list of all n candidates in advance. 

• On each day, we randomly choose a candidate from the list to interview (but 
considering only those we have not yet interviewed). 

• Instead of relying on the candidates being presented to us in a random order, 
we take control of the process and enforce a random order. 

What makes an algorithm randomized: An algorithm is randomized if its behav¬ 
ior is determined in part by values produced by a random-number generator. 

• R andom (a. b ) returns an integer r, where a < r < b and each of the b — a +1 
possible values of r is equally likely. 

• In practice, Random is implemented by a pseudorandom-number generator, 
which is a deterministic method returning numbers that “look” random and pass 
statistical tests. 


Indicator random variables 

A simple yet powerful technique for computing the expected value of a random 
variable. 

Helpful in situations in which there may be dependence. 

Given a sample space and an event A, we define the indicator random variable 

j r^i _ 11 if A occurs , 

[0 if A does not occur . 

Lemma 

For an event A, let X A = I {A}. Then E [X A ] = Pr {A}. 

Proof Letting A be the complement of A, we have 
E[X a ] = E [I {A}] 

= 1 • Pr {A} + 0 • Pr {A} (definition of expected value) 

= Pr {A} . 


■ (lemma) 
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Simple example: Determine the expected number of heads when we flip a fair coin 
one time. 

• Sample space is {H , T}. 

• Pr {//} = Pr {T} = 1/2. 

• Define indicator random variable X H = \{H). X H counts the number of heads 
in one flip. 

• Since Pr {//} = 1/2, lemma says that E[X#] = 1/2. 

Slightly more complicated example: Determine the expected number of heads in 
n coin flips. 

• Let X be a random variable for the number of heads in n flips. 

• Could compute E [X] = Ylk=o k P r = k}- In fact, this is what the book does 
in equation (C.36). 

• Instead, we’ll use indicator random variables. 

• For i = 1,2 , ... ,n, define X, = I {the ith flip results in event H}. 

• Then X = JX, Xj. 

• Lemma says that E [X,] = Pr {H} = 1/2 for i = 1,2, ... ,n. 

• Expected number of heads is E[X] = E | YH=\ A,]. 

• Problem: We want EQ^” =1 X,]. We have only the individual expectations 

E[Xi] ,E[X 2 ].E[X„], 

• Solution: Linearity of expectation says that the expectation of the sum equals 
the sum of the expectations. Thus, 

n 

E [X] = E 

_i= 1 
n 

= ^E[X,] 

1 = 1 
n 

= El/2 
( = 1 

= n/2. 

• Linearity of expectation applies even when there is dependence among the ran¬ 
dom variables. [Not an issue in this example, but it can be a great help. The 
hat-check problem of Exercise 5.2-4 is a problem with lots of dependence. See 
the solution on page 5-10 of this manual.] 

Analysis of the hiring problem 

Assume that the candidates arrive in a random order. 

Let X be a random variable that equals the number of times we hire a new office 
assistant. 

Define indicator random variables Xj, X 2 , ..., X„, where 
Xj = I {candidate i is hired} . 
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Useful properties: 

• X = Xi + x 2 + --- + x„. 

• Lemma => E [ X, ] = Pr {candidate i is hired}. 


We need to compute Pr {candidate i is hired}. 

• Candidate i is hired if and only if candidate i is better than each of candidates 

1,2 .* — 1 . 

• Assumption that the candidates arrive in random order =>• candidates 1,2,...,/ 
arrive in random order => any one of these first / candidates is equally likely to 
be the best one so far. 

• Thus, Pr {candidate / is the best so far} = 1//. 

• Which implies E[X,] = 1//. 


Now compute E [X|: 


E[X] 


E 


E*. 


= E e ™ 

i=l 


= 

i=\ 

= In n + 0(1) 


(equation (A.7): the sum is a harmonic series) . 


Thus, the expected hiring cost is 0(q, Inn), which is much better than the worst- 
case cost of O(ncft). 


Randomized algorithms 

Instead of assuming a distribution of the inputs, we impose a distribution. 


The hiring problem 

For the hiring problem, the algorithm is deterministic: 

• For any given input, the number of times we hire a new office assistant will 
always be the same. 

• The number of times we hire a new office assistant depends only on the input. 

• In fact, it depends only on the ordering of the candidates’ ranks that it is given. 

• Some rank orderings will always produce a high hiring cost. Example: (1, 2, 3, 
4, 5, 6), where each candidate is hired. 

• Some will always produce a low hiring cost. Example: any ordering in which 
the best candidate is the first one interviewed. Then only the best candidate is 
hired. 

• Some may be in between. 
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Instead of always interviewing the candidates in the order presented, what if we 
first randomly permuted this order? 

• The randomization is now in the algorithm, not in the input distribution. 

• Given a particular input, we can no longer say what its hiring cost will be. Each 
time we run the algorithm, we can get a different hiring cost. 

• In other words, each time we run the algorithm, the execution depends on the 
random choices made. 

• No particular input always elicits worst-case behavior. 

• Bad behavior occurs only if we get “unlucky” numbers from the random- 
number generator. 

Pseudocode for randomized hiring problem: 

Randomized-Hire-Assistant in) 
randomly permute the list of candidates 
Hire-Assistant in) 

Lemma 

The expected hiring cost of Randomized-Hire-Assistant is Oiq, Inn). 

Proof After permuting the input array, we have a situation identical to the proba¬ 
bilistic analysis of deterministic Hire-Assistant. ■ 


Randomly permuting an array 

[The book considers two methods of randomly permuting an n-element array. The 
first method assigns a random priority in the range 1 to n 3 to each position and then 
reorders the array elements into increasing priority order. We omit this method 
from these notes. The second method is better: it works in place (unlike the 
priority-based method), it runs in linear time without requiring sorting, and it needs 
fewer random bits (n random numbers in the range 1 to n rather than the range 1 
to n 3 ). We present and analyze the second method in these notes.] 

Goal: Produce a uniform random permutation (each of the n ! permutations is 
equally likely to be produced). 

Non-goal: Show that for each element A[i], the probability that A\i\ moves to 
position j is 1/n. (See Exercise 5.3-4, whose solution is on page 5-13 of this 
manual.) 

The following procedure permutes the array A[1.. n] in place (i.e., no auxiliary 
array is required). 

Randomize-In-Place(A, n) 

for / <— 1 to n 

do swap A\i\ -o- A[Random(/, n)] 
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Idea: 

• In iteration i, choose A\i\ randomly from A\i .. n\. 

• Will never alter A\i\ after iteration i. 

Time: 0(1) per iteration => 0(n) total. 

Correctness: Given a set of n elements, a k-permutation is a sequence containing 
k of the n elements. There are n\/(n — k)\ possible ^-permutations. 

Lemma 

Randomize-In-Place computes a uniform random permutation. 

Proof Use a loop invariant: 

Loop invariant: Just prior to the ith iteration of the for loop, for each 
possible (i — l)-permutation, subarray A[ 1 .. i — 1] contains this (i — 1)- 
permutation with probability (n — i + 1 )\/n\. 

Initialization: Just before first iteration, i — 1. Loop invariant says that for each 
possible O-permutation, subarray A[1 .. 0] contains this O-permutation with 
probability n\/n\ = 1. A[1. .0] is an empty subarray, and a O-permutation 
has no elements. So, A[1.. 0] contains any O-permutation with probability 1. 

Maintenance: Assume that just prior to the ith iteration, each possible (i — 1)- 
permutation appears in A[1.. i — 1] with probability (n — i + 1 )\/n\. Will show 
that after the ith iteration, each possible / -permutation appeal's in A[1.. i] with 
probability (n —i)\/n\. Incrementing i for the next iteration then maintains the 
invariant. 

Consider a particular' i -permutation jt = {x\, x 2 It consists of an 
(i — 1)-permutation jt' = {x\,x 2 ,..., x,-_ 1 ), followed by Xj. 

Let E\ be the event that the algorithm actually puts tt' into A[1.. i — 1]. By the 
loop invariant, Pr{£i) = (n — i + 1)!/«!. 

Let E 2 be the event that the ith iteration puts x, into A[i], 

We get the /-permutation n in A[1.. i] if and only if both £j and E 2 occur 
the probability that the algorithm produces tt in A[1.. i] is Pr{ E 2 fl E \}. 

Equation (C.14) =>• Pr {E 2 n Erf = Pr {E 2 \ E,}Pr{Ei}. 

The algorithm chooses jc,- randomly from the n — i + 1 possibilities in A[i .. n\ 
=> Pr {E 2 I £j} = l/(n - i + 1). Thus, 

Pr {E 2 Pi £j} = Pr{E 2 I EiJPrfE!} 

1 (n-i + 1)! 
n — i + 1 n ! 

_ (» - Q! 

n\ 

Termination: At termination, i = n + 1, so we conclude that A[1 .. n \ is a given 
n -permutation with probability (n — n)\ / n\ — 1 / n\. m (lemma) 
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Solution to Exercise 5.1-3 

To get an unbiased random bit, given only calls to Biased-Random, call 
Biased-Random twice. Repeatedly do so until the two calls return different 
values, and when this occurs, return the first of the two bits: 

Unbiased-Random 
while true 
do 

x <- Biased-Random 
y 4 - Biased-Random 

if* t 

then return x 

To see that Unbiased-Random returns 0 and 1 each with probability 1/2, ob¬ 
serve that the probability that a given iteration returns 0 is 

Pr {x = 0 and y — 1} = (1 — p)p , 

and the probability that a given iteration returns 1 is 

Prj.v = 1 and y = 0} = p( 1 — p) . 

(We rely on the bits returned by Biased-Random being independent.) Thus, the 
probability that a given iteration returns 0 equals the probability that it returns 1. 
Since there is no other way for Unbiased-Random to return a value, it returns 0 
and 1 each with probability 1 /2. 

Assuming that each iteration takes 0(1) time, the expected running time of 
Unbiased-Random is linear in the expected number of iterations. We can view 
each iteration as a Bernoulli trial, where “success” means that the iteration returns 
a value. The probability of success equals the probability that 0 is returned plus the 
probability that 1 is returned, or 2p(l — p). The number of trials until a success 
occurs is given by the geometric distribution, and by equation (C.31), the expected 
number of trials for this scenario is I /(2/;(I — /;)). Thus, the expected running 
time of Unbiased-Random is @(l/(2p(l - p)). 
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Solution to Exercise 5.2-1 

Since Hire-ASSISTANT always hires candidate 1, it hires exactly once if and only 
if no candidates other than candidate 1 are hired. This event occurs when candi¬ 
date 1 is the best candidate of the n, which occurs with probability I /n. 

Hire-Assistant hires n times if each candidate is better than all those who were 
interviewed (and hired) before. This event occurs precisely when the list of ranks 
given to the algorithm is (1, 2 ,,n), which occurs with probability \/n\. 


Solution to Exercise 5.2-2 

We make three observations: 

1. Candidate 1 is always hired. 

2. The best candidate, i.e., the one whose rank is n, is always hired. 

3. If the best candidate is candidate 1, then that is the only candidate hired. 

Therefore, in order for Hire-Assistant to hire exactly twice, candidate 1 must 
have rank i < n — 1 and all candidates whose ranks are / + 1, i + 2, ..., n — 1 must 
be interviewed after the candidate whose rank is n. (When i — n — 1, this second 
condition vacuously holds.) 

Let E, be the event in which candidate 1 has rank i ; clearly, Pr{£)} = I /n for any 
given value of i. 

Letting j denote the position in the interview order of the best candidate, let F be 
the event in which candidates 2, 3,..., j — 1 have ranks strictly less than the rank 
of candidate 1. Given that event F, has occurred, event F occurs when the best 
candidate is the first one interviewed out of the n — i candidates whose ranks are 
i + 1, i + 2,..., n. Thus, Pr {F \ E,} = 1 /(n — i). 

Our final event is A, which occurs when Hire-Assistant hires exactly twice. 
Noting that the events E \, E 2 , ...,£„ are disjoint, we have 

A = F n (£j U E 2 u ••• U £„_,) 

= (F n Ei) u (F n e 2 ) u ••• u (fn £„_i). 

and 

n— 1 

Pr {A} = ^Prffnf;} . 

i=i 

By equation (C.14), 

PviFDE,} = Pr{F | £)}Pr{£)•} 

1 1 
n — i n 
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and so 
Pr {A} 


n— 1 

E 


;=i 



1 

n 


1 i 

i E— 

n ~ n — i 

i =l 


i/i i i\ 

— I -7 ^-T + ’ ’ ’ T - T ) 

n \n — 1 n — 2 1 / 


1 

= - • H „-1 , 
n 

where //„_ i is the nth harmonic number. 


Solution to Exercise 5.2-4 

Another way to think of the hat-check problem is that we want to determine the 
expected number of fixed points in a random permutation. (A fixed point of a 
permutation n is a value i for which n(i) = i.) One could enumerate all n ! 
permutations, count the total number of fixed points, and divide by n ! to determine 
the average number of fixed points per permutation. This would be a painstaking 
process, and the answer would turn out to be 1. We can use indicator random 
variables, however, to arrive at the same answer much more easily. 

Define a random variable X that equals the number of customers that get back their 
own hat, so that we want to compute E[A]. 

For i = 1,2,.... n, define the indicator random variable 
Xj = I {customer i gets back his own hat} . 

Then A = X x + X 2 +-b X n . 

Since the ordering of hats is random, each customer has a probability of 1 /n of get¬ 
ting back his own hat. In other words, Pr{ X t = 1} = l/n, which, by Lemma 5.1, 
implies that E[A,] = l/n. 

Thus, 

n 

E[A| = E J2 X < 

_i= 1 
n 

= YElX,] (linearity of expectation) 

i=i 

n 

= I] 1 /" 

i =i 

= l, 

and so we expect that exactly 1 customer gets back his own hat. 

Note that this is a situation in which the indicator random variables are not inde¬ 
pendent. For example, if n = 2 and X\ — 1, then X 2 must also equal 1. Con¬ 
versely, if n =2 and X\ = 0, then X 2 must also equal 0. Despite the dependence, 
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Pr {Xi = 1} = l/n for all i, and lineality of expectation holds. Thus, we can use 
the technique of indicator random variables even in the presence of dependence. 


Solution to Exercise 5.2-5 

Let X,j be an indicator random variable for the event where the pair A\i\, A[j ] 
for i < j is inverted, i.e., A[i] > A[j ]. More precisely, we define X,j = 
I{A[i] > A[j]} for 1 < i < j < n. We have Pr{X, ; - = 1} = 1/2, because 
given two distinct random numbers, the probability that the first is bigger than the 
second is 1/2. By Lemma 5.1, E [X,j] = 1/2. 

Let X be the the random variable denoting the total number of inverted pairs in the 
array, so that 

n — 1 n 

x = E E x » ■ 

i= 1 j=i +1 

We want the expected number of inverted pairs, so we take the expectation of both 
sides of the above equation to obtain 

n— 1 n 

Em = E £ £ x u . 

_'=1 j=i +1 

We use linearity of expectation to get 

n— 1 n 

Em = E £ £ X U 

_i= 1 7 = 1+1 
n— 1 n 

= EE Em.] 

<=i ;=/+i 

n— 1 n 

= E E >/2 

i= i ;=/+! 



n(n — 1) 1 

2 2 
n(n — 1) 

4 

Thus the expected number of inverted pairs is n(n — l)/4. 


Solution to Exercise 5.3-1 


Here’s the rewritten procedure: 
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Randomize-In-Place (A) 
n length\A\ 

swap A[l] -o- A[Random(1, n)\ 

for i 2 to n 

do swap A\i | -o- A[Random(/, n)\ 

The loop invariant becomes 

Loop invariant: Just prior to the iteration of the for loop for each value of 
i = 2, ..., n, for each possible (i — 1)-permutation, the suharray A[1.. i — 1] 
contains this (7 — 1)-permutation with probability (n — i + 1)!/«!. 

The maintenance and termination parts remain the same. The initialization part 
is for the suharray A[1.. 1], which contains any 1-permutation with probability 
(n — !)!/«! = l/n. 


Solution to Exercise 5.3-2 

Although Permute-Without-Identity will not produce the identity permuta¬ 
tion, there are other permutations that it fails to produce. For example, consider 
its operation when n = 3, when it should be able to produce the n\ — 1 =5 non¬ 
identity permutations. The for loop iterates for i = 1 and i = 2. When i — 1, the 
call to Random returns one of two possible values (either 2 or 3), and when i = 2, 
the call to Random returns just one value (3). Thus, there are only 2-1=2 pos¬ 
sible permutations that Permute-Without-Identity can produce, rather than 
the 5 that are required. 


Solution to Exercise 5.3-3 


The Permute-With-All procedure does not produce a uniform random per¬ 
mutation. Consider the permutations it produces when n = 3. There are 3 calls 
to Random, each of which returns one of 3 values, and so there are 27 possible 
outcomes of calling Permute-With-All. Since there are 3! = 6 permutations, 
if Permute-With-All did produce a uniform random permutation, then each 
permutation would occur 1 /6 of the time. That would mean that each permutation 
would have to occur an integer number m times, where m /27 = 1 /6. No integer m 
satisfies this condition. 

In fact, if we were to work out the possible permutations of (1,2, 3) and how often 
they occur with Permute-With-All, we would get the following probabilities: 

permutation probability 

(1,2,3) 4727 

(1.3.2) 5/27 

(2.1.3) 5/27 

(2.3.1) 5/27 

(3.1.2) 4/27 

(3,2,1) 4/27 
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Although these probabilities add to 1, none are equal to 1/6. 


Solution to Exercise 5.3-4 

Permute-By-Cyclic chooses offset as a random integer in the range 1 < 
offset < n , and then it performs a cyclic rotation of the array. That is, 
B[((i + offset —1) mod n) + 1] 4- A\i\ for i = 1,2(The subtraction 
and addition of 1 in the index calculation is due to the 1-origin indexing. If we 
had used 0-origin indexing instead, the index calculation would have simplied to 
B[(i + offset ) mod n | <— A\i\ for i = 0, 1,..., n — 1.) 

Thus, once offset is determined, so is the entire permutation. Since each value of 
offset occurs with probability 1 /n, each element A\i\ has a probability of ending 
up in position B[j] with probability \/n. 

This procedure does not produce a uniform random permutation, however, since 
it can produce only n different permutations. Thus, n permutations occur with 
probability 1 /n, and the remaining n\ — n permutations occur with probability 0. 


Solution to Exercise 5.4-6 


First we determine the expected number of empty bins. We define a random vari¬ 
able X to be the number of empty bins, so that we want to compute E[X]. Next, for 
i = 1.2,...,/?, we define the indicator random variable Y, = I {bin i is empty}. 
Thus, 


i=t 


and so 


E[X] = E 


i=i 


(by lineality of expectation) 


! = 1 


Pr {bin i is empty} (by Lemma 5.1) . 


i=l 

Let us focus on a specific bin, say bin i. We view a toss as a success if it misses 

bin i and as a failure if it lands in bin i. We have n independent Bernoulli trials, 

each with probability of success 1 — \/n. In order for bin i to be empty, we need 
n successes in n trials. Using a binomial distribution, therefore, we have that 

(n\ ( 1 Nx1x 0 

Pr {bin i is empty} = 1- 

\n \ n 
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Thus, 

E[X] 




n 



By equation (3.13), as n approaches oo, the quantity (1 — I /nf approaches \/e, 
and so E [ X | approaches n/e. 


Now we determine the expected number of bins with exactly one ball. We re¬ 
define X to be number of bins with exactly one ball, and we redefine Y, to be 
I {bin i gets exactly one ball}. As before, we find that 


n 

E [ X ] = Pr {bin i gets exactly one ball} . 

i— I 


Again focusing on bin i , we need exactly n —1 successes in n independent Bernoulli 
trials, and so 


Pr {bin i gets exactly one ball} 


and so 
E [X] = 


Because 





n 



n (1 - !)" 

v _ II 7 

1-1 




l 


n ■ 



1 

n 



as n approaches oo, we find that E [X] approaches 

n/e n 2 

1 — 1 /n e(n — 1) 


Solution to Problem 5-1 

a. To determine the expected value represented by the counter after n INCREMENT 
operations, we define some random variables: 

• For j = 1,2let Xj denote the increase in the value represented by 
the counter due to the /th Increment operation. 

• Let V n be the value represented by the counter after n INCREMENT opera¬ 
tions. 
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Then V„ = X\ + X 2 + ■ ■ • + X„. We want to compute E[V„]. By linearity of 
expectation, 

E[V„] = E[X 1 + X 2 + ... + X„] = E[X 1 ]+E[X 2 ] + --- + E[X n ] . 

We shall show that E\Xj\ = 1 for j = 1,2...., n, which will prove that 
E [ V n \ = n. 

We actually show that E[Z ; ] = 1 in two ways, the second more rigorous than 
the first: 

1. Suppose that at the start of the / th Increment operation, the counter holds 
the value i, which represents n,. If the counter increases due to this INCRE¬ 
MENT operation, then the value it represents increases by n i+l — ti[. The 
counter increases with probability l/(«, + i — and so 

E [X j] = (0 • Pr {counter does not increase}) 

+ ((n,+i — tij) ■ Pr {counter increases}) 

= (o(l --- X) + ((n i+ i - tit) ---) 

\ V n i+ i — tit J J \ n i+] - nj 

= 1 , 

and soE[Xy] = 1 regai'dless of the value held by the counter. 

2. Let Cj be the random variable denoting the value held in the counter at the 
start of the / th Increment operation. Since we can ignore values of C) 
greater than 2 h — 1, we use a formula for conditional expectation: 

E [Xj] = E[E[Xj\Cj]] 

2 b -\ 

= J2 E[X J '|C; = i]-Pr{C;=i} • 

/=0 

To compute E [Xj \ Cj = /], we note that 

• Pr {Xj = 0\ Cj = i}=l - 1 /(«,+, - m), 

• Pr {Xj = n i+] - m | Cj = i] = l/(«,+i - «/), and 

• Pr {Xj — k | Cj — i} — 0 for all other k. 

Thus, 

E [Xj | Cj = i] = • Pr = k \ Cj = i} 

k 

= (v-(l ---)) + ((n i+ 1 - Hi) ---) 

\ V n i+ \ — tii J J \ n i+ \ — tij J 

= 1 . 

Therefore, noting that 
2 h - 1 

2>r{C J= /} = l, 

/—o 

we have 

2 b -l 

E [Xj] = £l-Pr {Cj = i} 

i =0 

= 1 . 
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Why is the second way more rigorous than the first? Both ways condition on the 
value held in the counter, but only the second way incorporates the conditioning 
into the expression for E[X,]. 


b. Defining V„ and X t as in part (a), we want to compute Var [ V n ], where n, = 
100/. The X; are pairwise independent, and so by equation (C.28), Var I V„ I = 
Var [ Vi J + Var [ V 2 1 + • • • + Var \X n \. 


Since n, = 100/, we see that n j+ \ — m = 100(/ + 1) — 100/ = 100. Therefore, 
with probability 99/100, the increase in the value represented by the counter 
due to the jth INCREMENT operation is 0, and with probability 1/100, the 
value represented increases by 100. Thus, by equation (C.26), 

Var[X ; ] = E[X 2 j]-E 2 [Xj] 




-l 2 


= 100-1 
= 99. 


Summing up the variances of the Xj gives Var | V„ \ = 99 n. 
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Chapter 6 overview 

Heapsort 

• 0(n lg n) worst case—like merge sort. 

• Sorts in place—like insertion sort. 

• Combines the best of both algorithms. 

To understand heapsort, we’ll cover heaps and heap operations, and then we’ll take 
a look at priority queues. 


Heaps 


Heap data structure 

• Heap A ( not garbage-collected storage) is a nearly complete binary tree. 

• Height of node = # of edges on a longest simple path from the node down to 
a leaf. 

• Height of heap = height of root = 0(lg n). 

• A heap can be stored as an array A. 

• Root of tree is A[l]. 

• Parent of A[i] = A[[//2J]. 

• Left child of A[i] = A[2/]. 

• Right child of A[; | = A[2/ + 1]. 

• Computing is fast with binary representation implementation. 


[In book , have length and heap-size attributes. Here, we bypass these attributes and 
use parameter values instead.] 
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Example: of a max-heap. [Arcs above and below the array on the right go between 
parents and children. There is no significance to whether an arc is drawn above or 
below the array.] 


1 



123456789 10 


16 

14 

10 

8 

7 

9 

3 

2 

4 

1 


Heap property 

• For max-heaps (largest element at root), max-heap property: for all nodes i, 
excluding the root, A[Parent(/)] > A[il 

• For min-heaps (smallest element at root), min-heap property: for all nodes i, 
excluding the root, A[Parent(/)] < A[il 

By induction and transitivity of <, the max-heap property guarantees that the max¬ 
imum element of a max-heap is at the root. Similar argument for min-heaps. 

The heapsort algorithm we’ll show uses max-heaps. 

Note: In general, heaps can be k-ary tree instead of binary. 


Maintaining the heap property 

Max-Heapify is important for manipulating max-heaps. It is used to maintain 
the max-heap property. 

• Before Max-Heapify, A[/] may be smaller than its children. 

• Assume left and right subtrees of i arc max-heaps. 

• After Max-Heapify, subtree rooted at i is a max-heap. 

Max-Heapify (A, i, n) 

1 Left(0 
r Right (/) 

if l < n and A[l] > A[i] 
then largest l 

else largest i 

if r < n and A[r] > A [largest] 
then largest r 
if largest i 

then exchange A [ ;| A | largest \ 

Max-Heapify (A, largest , n) 
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[Parameter n replaces attribute heap-size\A].] 

The way Max-Heapify works: 

• Compare A[i], A[Left(/)], and A[Right(/)]. 

• If necessary, swap A[i] with the larger of the two children to preserve heap 
property. 

• Continue this process of comparing and swapping down the heap, until subtree 
rooted at i is max-heap. If we hit a leaf, then the subtree rooted at the leaf is 
trivially a max-heap. 

Run Max-Heapify on the following heap example. 

1 i 




i 



• Node 2 violates the max-heap property. 

• Compare node 2 with its children, and then swap it with the larger of the two 
children. 

• Continue down the tree, swapping until the value is properly placed at the root 
of a subtree that is a max-heap. In this case, the max-heap is a leaf. 

Time: 0(lg/?). 

Correctness: [Instead of book’s formal analysis with recurrence, just come up 
with O(lgn) intuitively.] Heap is almost-complete binary tree, hence must pro¬ 
cess O (lg n) levels, with constant work at each level (comparing 3 items and maybe 
swapping 2). 


Building a heap 


The following procedure, given an unordered array, will produce a max-heap. 
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Build-Max-Heap (A, n) 
for i [n/2\ downto 1 

do Max-Heapify (A, i, n) 

[Parameter n replaces both attributes length\A\ and heap-size[A].] 

Example: Building a max-heap from the following unsorted array results in the 
first heap example. 

• i starts off as 5. 

• Max-Heapify is applied to subtrees rooted at nodes (in order): 16, 2, 3, 1, 4. 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

4 

1 

3 

2 

16 

9 

10 

14 

8 

7 


i i 



Correctness 

Loop invariant: At start of every iteration of for loop, each node i + 1, 
i + 2,. .., n is root of a max-heap. 

Initialization: By Exercise 6.1-7, we know that each node \_n/2\ + 1, \n /2J + 2, 

..., n is a leaf, which is the root of a trivial max-heap. Since i = \n/2\ before 
the first iteration of the for loop, the invariant is initially true. 

Maintenance: Children of node i are indexed higher than i, so by the loop invari¬ 
ant, they are both roots of max-heaps. Correctly assuming that i +1, i +2, ..., n 
are all roots of max-heaps, Max-Heapify makes node i a max-heap root. 
Decrementing i reestablishes the loop invariant at each iteration. 

Termination: When i = 0, the loop terminates. By the loop invariant, each node, 
notably node 1, is the root of a max-heap. 


Analysis 

• Simple bound: 0{n) calls to Max-Heapify, each of which takes 0(lg n) 
time => 0(n lg n). (Note: A good approach to analysis in general is to start by 
proving easy bound, then try to tighten it.) 

• Tighter analysis: Observation: Time to run Max-Heapify is linear - in the 
height of the node it’s run on, and most nodes have small heights. Have 
< \n/2 h+l ~\ nodes of height h (see Exercise 6.3-3), and height of heap is |_lg «J 
(Exercise 6.1-2). 
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The time required by Max-Heapify when called on a node of height h 
is 0(h), so the total cost of Build-Max-Heap is 



Evaluate the last summation by substituting a = 1/2 in the formula (A. 8) 
(YlkLo kx k ), which yields 


OO 


E 


h 

2 * 


1/2 

(1 - 1 / 2) 2 

2 . 


Thus, the running time of Build-Max-Heap is 0(n). 

Building a min-heap from an unordered array can be done by calling Min- 
Heapify instead of Max-Heapify, also taking linear time. 


The heapsort algorithm 

Given an input array, the heapsort algorithm acts as follows: 

• Builds a max-heap from the array. 

• Starting with the root (the maximum element), the algorithm places the maxi¬ 
mum element into the correct place in the array by swapping it with the element 
in the last position in the array. 

• “Discard” this last node (knowing that it is in its correct place) by decreasing the 
heap size, and calling Max-Heapify on the new (possibly incorrectly-placed) 
root. 

• Repeat this “discarding” process until only one node (the smallest element) 
remains, and therefore is in the correct place in the array. 

Heapsort(A, n) 

Build-Max-Heap (A, n) 
for i <- n downto 2 

do exchange A[l] -o- A\i\ 

Max-Heapify (A, 1,/ - 1) 

[Parameter n replaces length[A ], and parameter value i — 1 in Max-Heapify call 
replaces decrementing ofheap-size[A].] 

Example: Sort an example heap on the board. [Nodes with heavy outline are no 
longer in the heap.] 
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(a) 



'© @ 

(c) 

0 

'© 0 

© 0 

(e) 



1 

2 

3 

4 

7 


Analysis 

• Build-Max-Heap: O(n) 

• for loop: n — I times 

• exchange elements: 0(1) 

• Max-Heapify: 0(lgn) 

Total time: 0(n lg n). 

Though heapsort is a great algorithm, a well-implemented quicksort usually beats 
it in practice. 


Heap implementation of priority queue 

Heaps efficiently implement priority queues. These notes will deal with max- 
priority queues implemented with max-heaps. Min-priority queues are imple¬ 
mented with min-heaps similarly. 

A heap gives a good compromise between fast insertion but slow extraction and 
vice versa. Both operations take 0(lg n) time. 


Priority queue 

• Maintains a dynamic set S of elements. 

• Each set element has a key —an associated value. 

• Max-priority queue supports dynamic-set operations: 

• Insert(S, x): inserts element v into set S. 

• Maximum (S): returns element of S with largest key. 
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• Extract-Max (S): removes and returns element of S with largest key. 

• Increase-Key (S, x, k ): increases value of element x’s key to k. Assume 
k > x's current key value. 

• Example max-priority queue application: schedule jobs on shared computer. 

• Min-priority queue supports similar operations: 

• Insert(S, x): inserts element x into set S. 

• Minimum (5): returns element of S with smallest key. 

• Extract-Min(S): removes and returns element of S with smallest key. 

• Decrease-Key (5, x, k): decreases value of element x’s key to k. Assume 
k < jc’s current key value. 

• Example min-priority queue application: event-driven simulator. 

Note: Actual implementations often have a handle in each heap element that allows 
access to an object in the application, and objects in the application often have a 
handle (likely an array index) to access the heap element. 

Will examine how to implement max-priority queue operations. 

Finding the maximum element 

Getting the maximum element is easy: it’s the root. 

Heap-Maximum (A) 

return A[l] 

Time: 0(1). 

Extracting max element 

Given the array A: 

• Make sure heap is not empty. 

• Make a copy of the maximum element (the root). 

• Make the last node in the tree the new root. 

• Re-heapify the heap, with one fewer node. 

• Return the copy of the maximum element. 

Heap-Extract-Max (A, n) 
if n < 1 

then error “heap underflow” 
max <— A[ 1] 

A[l] <- A[n\ 

Max-Heapify(A, 1, n — 1) > remakes heap 

return max 

[Parameter n replaces heap-size[A ], and parameter value n — 1 in Max-Heapify 
call replaces decrementing ofheap-size[A].] 
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Analysis: constant time assignments plus time for Max-Heapify. 

Time: 0(\gn). 

Example: Run Heap-Extract-Max on first heap example. 

• Take 16 out of node 1. 

• Move 1 from node 10 to node 1. 

• Erase node 10. 

• Max-Heapify from the root to preserve max-heap property. 

• Note that successive extractions will remove items in reverse sorted order. 


Increasing key value 

Given set S, element x, and new key value k: 

• Make sure k > x’s current key. 

• Update .r’s key value to k. 

• Traverse the tree upward comparing x to its parent and swapping keys if neces¬ 
sary, until x’s key is smaller than its parent's key. 

Heap-Increase-Key (A, i, key) 
if key < A\i\ 

then error “new key is smaller than current key” 

A[/] <— key 

while / > 1 and A[Parent(/)] < A[/] 
do exchange A[/] -o- A[Parent(/)] 

/ 4 - Parent (/) 

Analysis: Upward path from node / has length 0(lg n) in an n-element heap. 
Time: 0(\gn). 

Example: Increase key of node 9 in first heap example to have value 15. Exchange 
keys of nodes 4 and 9, then of nodes 2 and 4. 


Inserting into the heap 

Given a key k to insert into the heap: 

• Insert a new node in the very last position in the tree with key — oo. 

• Increase the —oo key to k using the Heap-Increase-Key procedure defined 
above. 

Max-Heap-Insert (A, key, n ) 

A[n + 1] < -oo 

Heap-Increase-Key (A, n + 1, key ) 

[Parameter n replaces heap-size\A], and use of value n + 1 replaces incrementing 
of heap-size[A].] 
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Analysis: constant time assignments + time for Heap-Increase-Key. 
Time: 0(\gn). 

Min-priority queue operations are implemented similarly with min-heaps. 



Solutions for Chapter 6: 
Heapsort 


Solution to Exercise 6.1-1 

Since a heap is an almost-complete binary tree (complete at all levels except pos¬ 
sibly the lowest), it has at most 2 /!+1 — 1 elements (if it is complete) and at least 
2 h — 1 + 1 = 2 h elements (if the lowest level has just 1 element and the other levels 
are complete). 


Solution to Exercise 6.1-2 

Given an n -element heap of height h, we know from Exercise 6.1-1 that 
2 h <n< 2 /,+1 - 1 < 2 h+l . 

Thus, h < lg n </? + !. Since h is an integer, h = Llg n\ (by definition of L J). 


Solution to Exercise 6.1-3 

Assume the claim is false—i.e., that there is a subtree whose root is not the largest 
element in the subtree. Then the maximum element is somewhere else in the sub¬ 
tree, possibly even at more than one location. Let m be the index at which the 
maximum appears (the lowest such index if the maximum appears more than once). 
Since the maximum is not at the root of the subtree, node m has a parent. Since 
the parent of a node has a lower index than the node, and m was chosen to be the 
smallest index of the maximum value, A\ Parent (m)\ < A[m], But by the max- 
heap property, we must have A[PARENT(m)] > A\m\. So our assumption is false, 
and the claim is true. 


Solution to Exercise 6.2-6 


If you put a value at the root that is less than every value in the left and right 
subtrees, then Max-Heapify will be called recursively until a leaf is reached. To 
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make the recursive calls traverse the longest path to a leaf, choose values that make 
Max-Heapify always recurse on the left child. It follows the left branch when the 
left child is > the right child, so putting 0 at the root and 1 at all the other nodes, for 
example, will accomplish that. With such values, Max-Heapify will be called h 
times (where h is the heap height, which is the number of edges in the longest path 
from the root to a leaf), so its running time will be 0(/z) (since each call does 0(1) 
work), which is 0(lgn). Since we have a case in which Max-Heapify ’s running 
time is 0(lgn), its worst-case running time is £2(lgn). 


Solution to Exercise 6.3-3 

Let H be the height of the heap. 

Two subtleties to beware of: 

• Be careful not to confuse the height of a node (longest distance from a leaf) 
with its depth (distance from the root). 

• If the heap is not a complete binary tree (bottom level is not full), then the nodes 
at a given level (depth) don’t all have the same height. For example, although all 
nodes at depth H have height 0, nodes at depth H — 1 can have either height 0 
or height 1. 

For a complete binary tree, it’s easy to show that there are \n/2 h+l ~\ nodes of 
height h. But the proof for an incomplete tree is tricky and is not derived from the 
proof for a complete tree. 

Proof By induction on h. 

Basis: Show that it’s true for h = 0 (i.e., that # of leaves < |"«/2 /,+1 l = \n /2~|). 

In fact, we’ll show that the # of leaves = \n /2~|. 

The tree leaves (nodes at height 0) are at depths H and H — 1. They consist of 

• all nodes at depth H, and 

• the nodes at depth H — 1 that are not parents of depth-// nodes. 

Let x be the number of nodes at depth H —that is, the number of nodes in the 
bottom (possibly incomplete) level. 

Note that n — x is odd, because the n — x nodes above the bottom level form a 
complete binary tree, and a complete binary tree has an odd number of nodes (1 
less than a power of 2). Thus if n is odd, x is even, and if n is even, x is odd. 

To prove the base case, we must consider separately the case in which n is even 
( x is odd) and the case in which n is odd (x is even). Here are two ways to do 
this: The first requires more cleverness, and the second requires more algebraic 
manipulation. 

1. First method of proving the base case: 

• If n is odd, then x is even, so all nodes have siblings—i.e., all internal 
nodes have 2 children. Thus (see Exercise B.5-3), # of internal nodes = 
# of leaves — 1. 


























6-12 


Solutions for Chapter 6: Heapsort 


So, n = # of nodes = # of leaves + # of internal nodes = 2 • # of leaves — 1. 
Thus, # of leaves = (n + l)/2 = \n/ 2]. (The latter equality holds because n 
is odd.) 

• If n is even, then x is odd, and some leaf doesn’t have a sibling. If we gave 
it a sibling, we would have n + 1 nodes, where n + 1 is odd, so the case 
we analyzed above would apply. Observe that we would also increase the 
number of leaves by 1, since we added a node to a parent that already had 
a child. By the odd-node case above, # of leaves + 1 = \{n + 1)/2~| = 
f n /2~| + 1. (The latter equality holds because n is even.) 

In either case, # of leaves = \n/2~\. 

2. Second method of proving the base case: 

Note that at any depth d < H there are 2 ! nodes, because all such tree levels 

are complete. 


• If v is even, there are x/2 nodes at depth H — 1 that are parents of depth H 
nodes, hence 2 H ~ l —x /2 nodes at depth H — 1 that are not parents of depth-// 
nodes. Thus, 


total # of height-0 nodes 


x + 2 H ~ l — x /2 
2 h ~ 1 + x /2 
( 2 H + x )/2 

[( 2 H + x — 1)/2~| (because x is even) 


= T «/21 . 

( n = 2 H + x — 1 because the complete tree down to depth H — 1 has 2^ — 1 
nodes and depth H has x nodes.) 

• If x is odd, by an argument similar to the even case, we see that 
# of height-0 nodes = x + 2 H ~ l — (x + l)/2 
= 2 H ~ [ + (x — l)/2 

= (2 h + x — l)/2 


= n/2 

= [n/2] (because x odd =>■ n even) . 


Inductive step: Show that if it’s true for height h — 1, it’s true for h. 

Let n h be the number of nodes at height h in the n-node tree T. 

Consider the tree V formed by removing the leaves of T. It has n = n — n 0 nodes. 
We know from the base case that n 0 = \n/2'\,son l = n—rio = n — \n/ 2 "| = \n/ 2 \. 

Note that the nodes at height h in T would be at height h — 1 if the leaves of the 
tree were removed—that is, they are at height /7 — 1 in T. Letting n' h _ ] denote the 
number of nodes at height /? — 1 in T', we have 

n h = n h-\ ■ 

By induction, we can bound n' h _ x \ 

n h = < W/2 h A = \Vn/2\ /2 ,! ] < \(n/2)/2 h ] = \n/2 h+ ^ . ■ 
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Solution to Exercise 6.5-2 



Solution to Problem 6-1 


a. The procedures Build-Max-Heap and Build-Max-Heap' do not always 
create the same heap when run on the same input array. Consider the following 
counterexample. 

Input array A: 


A 


12 3 



b. An upper bound of 0(n lg n) time follows immediately from there being n — 1 
calls to Max-Heap-Insert, each taking Odg n) time. For a lower bound of 
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£2 (n Ig n), consider the case in which the input array is given in strictly increas¬ 
ing order. Each call to Max-Heap-Insert causes Heap-Increase-Key to 
go all the way up to the root. Since the depth of node i is [lg i J, the total time is 

n n 

X>(Ug/j) > J2 ®(Ligr»/2ij) 

i= 1 i=\n/2] 

n 

> ©(Llg(«/2)J) 

;=r«/21 


= ]T 0(Llg»-iJ) 

Mn/Z] 

> n/2-@(lgn) 

= £2 (n lg n) . 


In the worst case, therefore, Build-Max-Heap' requires (~)(n Ig n) time to 
build an n -element heap. 


Solution to Problem 6-2 

a. A d-ary heap can be represented in a 1-dimensional array as follows. The root 
is kept in A[l], its d children are kept in order in A[2] through A\d + 1], their 
children are kept in order in A\d + 2] through A[d 2 + d + 1], and so on. The 
following two procedures map a node with index i to its parent and to its / th 
child (for 1 < j < d), respectively. 

d-ary-Parent(Z) 
return [d — 2)/d + 1J 

d-ary-Child(/, j ) 
return d(i — 1) + j + 1 

To convince yourself that these procedures really work, verify that 
d-ary-Parent (d-ary-Child (/, j)) = i , 

for any 1 <j<d. Notice that the binary heap procedures are a special case of 
the above procedures when d = 2. 

b. Since each node has d children, the height of a d-ary heap with n nodes is 
©(log rf n) = 0(lgn/lgd). 

c. The procedure Heap-Extract-Max given in the text for binary heaps works 
fine for d-ary heaps too. The change needed to support d-ary heaps is in Max- 
Heapify, which must compare the argument node to all d children instead of 
just 2 children. The running time of Heap-Extract-Max is still the running 
time for Max-Heapify, but that now takes worst-case time proportional to the 
product of the height of the heap by the number of children examined at each 
node (at most d), namely 0(d lo& n) = 0(d lg n/ lg d). 
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d. The procedure Max-Heap-Insert given in the text for binary heaps works 
fine for d-ary heaps too. The worst-case running time is still @(h), where h 
is the height of the heap. (Since only parent pointers are followed, the number 
of children a node has is irrelevant.) For a d-ary heap, this is ©(log; n) = 
0(lg n/lgd). 

e. d-ary-Heap-Increase-Key can be implemented as a slight modification 
of Max-Heap-Insert (only the first couple lines are different). Increas¬ 
ing an element may make it larger than its parent, in which case it must be 
moved higher up in the tree. This can be done just as for insertion, travers¬ 
ing a path from the increased node toward the root. In the worst case, the 
entire height of the tree must be traversed, so the worst-case running time is 
®(h) = ©(log d n) = ©(lg n/lgd). 

d-ary-Heap-Increase-Key (A, i, k) 

A[i] <r- max(A[i], k) 
while i > 1 and A[Parent(/)] < A\i\ 
do exchange A[/] -o- A[Parent(7)] 
i <r- Parent (/) 
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Chapter 7 overview 

[The treatment in the second edition differs from that of the first edition. We use 
a different partitioning method—known as “Lomuto partitioning”—in the second 
edition , rather than the “Hoare partitioning” used in the first edition. Using Lomuto 
partitioning helps simplify the analysis, which uses indicator random variables in 
the second edition.] 


Quicksort 

• Worst-case running time: 0(/r). 

• Expected running time: ©(« lg n). 

• Constants hidden in 0(« lg n) are small. 

• Sorts in place. 


Description of quicksort 

Quicksort is based on the three-step process of divide-and-conquer. 

• To sort the subarray A| p .. r]: 

Divide: Partition A[p .. r], into two (possibly empty) subarrays A\p . .q — 1] 
and A[q + 1 .. r], such that each element in the first subarray A[p .. q — 1] 
is < A[g] and A[g] is < each element in the second subarray A[q + 1.. r]. 

Conquer: Sort the two subarrays by recursive calls to QUICKSORT. 

Combine: No work is needed to combine the subarrays, because they are sorted 
in place. 

• Perform the divide step by a procedure Partition, which returns the index q 
that marks the position separating the subarrays. 
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Quicksort(A, p, r) 
if p < r 

then q <r- PARTITION(A, p, r) 
QuiCKSORT(A, p,q - 1) 
QuiCKSORT(A, q + 1, r) 

Initial call is QuiCKSORT(A, 1, n). 


Partitioning 

Partition subarray A\p . .r] by the following procedure: 

Partition(A, p, r) 
x «- A[r ] 
i p — 1 
for j p to r — 1 
do if A[j] < x 

then i <r- i + 1 

exchange A\i | -o- A\j\ 
exchange A\i + 1] -o- A[r] 
return i + 1 

• Partition always selects the last element A[r] in the subarray A[p .. r] as the 
pivot—the element around which to partition. 

• As the procedure executes, the array is partitioned into four regions, some of 
which may be empty: 

Loop invariant: 

1. All entries in A[p .. i \ are < pivot. 

2. All entries in A[i + 1.. j — 1] are > pivot. 

3. A[r] = pivot. 

It’s not needed as paid of the loop invariant, but the fourth region is A\ j .. r — 1], 
whose entries have not yet been examined, and so we don’t know how they 
compare to the pivot. 


Example: On an 8-element subarray. 
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i PJ r 




P i j r 



1 

4 

0 

a 

a 

8 

9 

a 


A\r\\ pivot 

A [/'.. /•-! ]: not yet examined 
A[/+l ..y—1]: known to be > pivot 
A[p .. (]: known to be < pivot 


[The index j disappears because it is no longer needed once the for loop is exited.] 

Correctness: Use the loop invariant to prove correctness of Partition: 

Initialization: Before the loop starts, all the conditions of the loop invariant are 
satisfied, because r is the pivot and the subarrays A\p .. i | and A\i + 1.. j — 1] 
are empty. 

Maintenance: While the loop is running, if A\ j\ < pivot, then A[j ] and A\i + 1] 
are swapped and then i and j are incremented. If A\ j\ > pivot, then increment 
only j. 

Termination: When the loop terminates, j = r, so all elements in A are parti¬ 
tioned into one of the three cases: A[p .. /] < pivot, A[i + 1.. r — 1] > pivot, 
and A[r] = pivot. 

The last two lines of PARTITION move the pivot element from the end of the array 

to between the two subarrays. This is done by swapping the pivot and the first 

element of the second subarray, i.e., by swapping A\i + 1] and A[r]. 

Time for partitioning: &(n) to partition an n -element subarray. 








7-4 


Lecture Notes for Chapter 7: Quicksort 


Performance of quicksort 

The running time of quicksort depends on the partitioning of the subarrays: 

• If the subarrays are balanced, then quicksort can run as fast as mergesort. 

• If they are unbalanced, then quicksort can run as slowly as insertion sort. 

Worst case 

• Occurs when the subarrays are completely unbalanced. 

• Have 0 elements in one subarray and n — 1 elements in the other subarray. 

• Get the recurrence 

T(n) = T(n — l) + T (0) + 0(n) 

= T(n - 1) + 0(«) 

= 0O? 2 ). 

• Same running time as insertion sort. 

• In fact, the worst-case running time occurs when quicksort takes a sorted array 
as input, but insertion sort runs in O(n) time in this case. 

Best case 

• Occurs when the subarrays are completely balanced every time. 

• Each subarray has < n/2 elements. 

• Get the recurrence 

Tin) = 2T(n/2) + @(n) 

= ®(n lg?i) . 

Balanced partitioning 

• Quicksort’s average running time is much closer to the best case than to the 
worst case. 

• Imagine that PARTITION always produces a 9-to-l split. 

• Get the recurrence 

T (n) < T (9n/10) + T (n/ 10) + 0 in) 

= Oin lgn) . 

• Intuition: look at the recursion tree. 

• It’s like the one for T(n) = T(n/3) + T ( 2n/3 ) + O(n) in Section 4.2. 

• Except that here the constants are different; we get log 10 n full levels and 
logio /9 n levels that are nonempty. 

• As long as it’s a constant, the base of the log doesn’t matter in asymptotic 
notation. 

• Any split of constant proportionality will yield a recursion tree of depth 
0(lg n). 
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Intuition for the average case 

• Splits in the recursion tree will not always be constant. 

• There will usually be a mix of good and bad splits throughout the recursion 
tree. 

• To see that this doesn’t affect the asymptotic running time of quicksort, assume 
that levels alternate between best-case and worst-case splits. 


0 



Q(n) 



0 («) 


• The extra level in the left-hand figure only adds to the constant hidden in the 
©-notation. 

• There are still the same number of subarrays to sort, and only twice as much 
work was done to get to that point. 

• Both figures result in 0(n lg n) time, though the constant for the figure on the 
left is higher than that of the figure on the right. 


Randomized version of quicksort 

• We have assumed that all input permutations are equally likely. 

• This is not always true. 

• To correct this, we add randomization to quicksort. 

• We could randomly permute the input array. 

• Instead, we use random sampling , or picking one element at random. 

• Don’t always use A [r ] as the pivot. Instead, randomly pick an element from the 
subarray that is being sorted. 

We add this randomization by not always using A[r] as the pivot, but instead ran¬ 
domly picking an element from the subarray that is being sorted. 

Randomized-Partition(A, p, r) 

i <r- RANDOMfp, r ) 
exchange A[r] -o- A[i] 
return PARTITION (A, p,r) 

Randomly selecting the pivot element will, on average, cause the split of the input 
array to be reasonably well balanced. 
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Randomized-Quicksort(A, p, r) 
if p < r 

then q Randomized-Partition(A, p, r) 

Randomized-Quicksort(A, p. q — 1) 

Randomized-Quicksort(A, q + 1, r) 

Randomization of quicksort stops any specific type of array from causing worst- 
case behavior. For example, an already-sorted array causes worst-case behavior in 
non-randomized Quicksort, but not in Randomized-Quicksort. 


Analysis of quicksort 

We will analyze 

• the worst-case running time of Quicksort and Randomized-Quicksort 
( the same), and 

• the expected (average-case) running time of Randomized-Quicksort. 

Worst-case analysis 

We will prove that a worst-case split at every level produces a worst-case running 
time of 0(n 2 ). 

• Recurrence for the worst-case running time of QUICKSORT: 

T(n) = max (T(q) + T(n — q — 1)) + 0(«) . 

0<q<n—l 

• Because PARTITION produces two subproblems, totaling size n — 1, q ranges 
from 0 to n — 1. 

• Guess: T (n) < cn 2 , for some c. 

• Substituting our guess into the above recurrence: 

T (, n ) < max (cq 2 + c(n — q — l) 2 ) + @(n) 

0<q<n—l 

= c ■ max (q 2 + (n — q — 1 ) 2 ) + 0(«) . 

0<q<n —1 

• The maximum value of (q 2 + (n — q — 1 ) 2 ) occurs when q is either 0 or n — 1. 
(Second derivative with respect to q is positive.) This means that 

max (q 2 + (n — q — l) 2 ) < (n — l) 2 

0<q<n —1 

= n 2 — 2 n + 1 . 

• Therefore, 

T (, n ) < cn 2 — c(2n — 1) + 0(«) 

< cn 2 if c(2n — 1) > 0(«) . 

• Pick c so that c(2n — 1) dominates 0(n). 

• Therefore, the worst-case running time of quicksort is O(tr ). 

• Can also show that the recurrence’s solution is £2 (it). Thus, the worst-case 
running time is 0(n 2 ). 
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Average-case analysis 

• The dominant cost of the algorithm is partitioning. 

• Partition removes the pivot element from future consideration each time. 

• Thus, Partition is called at most n times. 

• Quicksort recurses on the partitions. 

• The amount of work that each call to PARTITION does is a constant plus the 
number of comparisons that are performed in its for loop. 

• Let X = the total number of comparisons performed in all calls to PARTITION. 

• Therefore, the total work done over the entire execution is 0(n + X). 

We will now compute a bound on the overall number of comparisons. 

For ease of analysis: 

• Rename the elements of A as z\, Zi, • • •, z n , with 74 being the ith smallest ele¬ 
ment. 

• Define the set Z ( - y - = [zt, Zi+ 1 , ..., Zj] to be the set of elements between 74 
and Zj, inclusive. 

Each pair of elements is compared at most once, because elements are compared 
only to the pivot element, and then the pivot element is never in any later call to 
Partition. 

Let Xjj = I {z, is compared to Zj}. 

(Considering whether 74 is compared to Zj at any time during the entire quicksort 
algorithm, not just during one call of PARTITION.) 

Since each pair is compared at most once, the total number of comparisons per¬ 
formed by the algorithm is 

n — 1 n 

;=i j=i +1 

Take expectations of both sides, use Lemma 5.1 and linearity of expectation: 

n— 1 n 

Em = e x; E x <j 

_i= 1 j=i-\-\ 
n— 1 n 

= EE E [*«] 

i'=l j=i +1 

n —1 n 

= EE Pr {z, is compared to z 7 } . 

i= 1 7=0-1 

Now all we have to do is find the probability that two elements are compared. 

• Think about when two elements are not compared. 

• For example, numbers in separate partitions will not be compared. 

• In the previous example, (8, 1, 6, 4,0, 3, 9, 5) and the pivot is 5, so that none 
of the set {1, 4, 0, 3} will ever be compared to any of the set {8, 6, 9}. 
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• Once a pivot x is chosen such that Zi < x < Zj, then n and Zj will never be 
compared at any later time. 

• If either Zi or zj is chosen before any other element of Z (/ -, then it will be 
compared to all the elements of Z (/ , except itself. 

• The probability that Zi is compared to zj is the probability that either z, or Zj is 
the first element chosen. 

• There are j—i + l elements, and pivots are chosen randomly and independently. 
Thus, the probability that any particular one of them is the first one chosen is 

I/O -/ + !). 


Therefore, 

Pr {zi is compared to Zj} = Pr [z,i or zj is the first pivot chosen from Z (/ } 

= Pr {zi is the first pivot chosen from Z (/ } 

+ Pr {zj is the first pivot chosen from Z, ; } 
1 1 
j-i+l j~i +1 
2 

j-i+l' 


[The second line follows because the two events are mutually exclusive.] 
Substituting into the equation for E[X]: 


n— 1 n 


E m = E E 73 


;=i j=i +1 


j -i + l 


Evaluate by using a change in variables (k = j — i) and the bound on the harmonic 
series in equation (A.7): 


E[X] 


n— 1 n 


EE 

(=1 7=1+1 


2 

j - i + l 


n— 1 n—i 

= EE 


1=1 k= 1 


2 

k + 1 


< 


n— 1 


T.T.J 

i=i *= i K 


n— 1 


= ^O(lgn) 

i=i 

= 0(n \gn) . 


So the expected running time of quicksort, using Randomized-Partition, is 
0{n lg n). 
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Quicksort 


Solution to Exercise 7.2-3 

Partition does a “worst-case partitioning" when the elements are in decreasing 
order. It reduces the size of the subarray under consideration by only 1 at each step, 
which we’ve seen has running time (-)(/i 2 ). 

In particular, Partition, given a subarray A\p .. r | of distinct elements in de¬ 
creasing order, produces an empty partition in A[p .. q — 1], puts the pivot (orig¬ 
inally in A[r]) into A\p\, and produces a partition A\p + 1.. r] with only one 
fewer element than A[p .. r]. The recurrence for QUICKSORT becomes T(n) = 
T (n — 1) + (-)(/; ), which has the solution T ( n ) = 0 (n 2 ). 


Solution to Exercise 7.2-5 

The minimum depth follows a path that always takes the smaller paid of the par¬ 
tition—i.e., that multiplies the number of elements by a. One iteration reduces 
the number of elements from n to an, and i iterations reduces the number of ele¬ 
ments to a'n. At a leaf, there is just one remaining element, and so at a minimum- 
depth leaf of depth m, we have a m n = 1. Thus, a m = \/n. Taking logs, we get 
m lg a = — lg n, or m = — lg nj lg a. 

Similarly, maximum depth corresponds to always taking the larger paid of the par¬ 
tition, i.e., keeping a fraction 1 — a of the elements each time. The maximum 
depth M is reached when there is one element left, that is, when (1 — a f 1 n = 1. 
Thus, M = — lg n/ lg(l — a). 

All these equations are approximate because we are ignoring floors and ceilings. 


Solution to Exercise 7.3-1 

We may be interested in the worst-case performance, but in that case, the random¬ 
ization is irrelevant: it won’t improve the worst case. What randomization can do 
is make the chance of encountering a worst-case scenario small. 
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Solution to Exercise 7.4-2 


To show that quicksort’s best-case running time is £2 (n Ig «), we use a technique 
similar to the one used in Section 7.4.1 to show that its worst-case running time 
is 0(n 2 ). 

Let T (n) be the best-case time for the procedure QUICKSORT on an input of size «. 
We have the recurrence 


T («) = min (T(q) + T(n — q — 1)) + ©(«) . 

1 <q<n —1 

We guess that T (n) > cn lg n for some constant c. Substituting this guess into the 
recurrence, we obtain 

T («) > min ( cq \gq + c(n — q — 1) lg(ft — q — 1)) + ©(«) 

\<q<n—l 

= c ■ min (q lg q + (n - q - 1) lg(« - q - 1)) + ©(«) . 

\<q<n—\ 


As we’ll show below, the expression q lg q + (ft — q — 1) lg(n — q — 1) achieves a 
minimum over the range 1 < q <n — 1 when q = n — q — 1, or q = (n —1)/2, since 
the first derivative of the expression with respect to q is 0 when q — (ft — l)/2 and 
the second derivative of the expression is positive. (It doesn’t matter that q is not 
an integer when n is even, since we’re just trying to determine the minimum value 
of a function, knowing that when we constrain q to integer values, the function’s 
value will be no lower.) 


Choosing q = (n — l)/2 gives us the bound 
min (q lg q + (n - q - 1) lg(ft —q — 1) 

l<q<n—l 


> 




+ 


("■ V’-'M" 



-1 


«— i 

= («— i) ig - . 

Continuing with our bounding of T (n), we obtain, for n > 2, 
T (n) > c(n - 1) lg - + ©(ft) 


= c(h — 1) lg(n — 1) — c(n — 1) + ©(ft) 

= cn lg(« — 1) — clg(ft — 1) — c(n — 1) + ©(ft) 

> cn lg(n/2) — c lg (h — 1) — c(n — 1) + ©(ft) (since n > 2) 


= cn lg ft — cn — c lg(ft — 1) — cn + c + © («) 

= cn lg ft — (2 cn + clg(n — 1) — c) + ©(ft) 

> cn lg ft , 

since we can pick the constant c small enough so that the ©(«) term dominates the 
quantity 2 cn + clg(ft — 1) — c. Thus, the best-case running time of quicksort is 
£2 («lg ft). 

Letting f(q) = qlgq + (ft — q — l)lg(ft — q — 1), we now show how to find 
the minimum value of this function in the range 1 < Q < ft — 1. We need to find 
the value of q for which the derivative of / with respect to q is 0. We rewrite this 
function as 
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f(q ) = 


q In q + (n< — q — 1) ln(n — q — 1) 
In2 


and so 

f(q) 


d /q lng + (n — q — 1) ln(n — q — 1) 
dq \ In 2 

In q + 1 — ln(n — q — 1) — 1 
ln~2 

In q — In (n — q — 1) 

In2 ' 


The derivative f'(q) is 0 when q — n — q — 1, or when q = (n — l)/2. To verify 
that q = (n — l)/2 is indeed a minimum (not a maximum or an inflection point), 
we need to check that the second derivative of / is positive at q = (n — l)/2: 

d f\nq - In (n -q - 1) 
f ^ dq V In 2 

1 /1 | 1 \ 
ln2\^ n — q — 1 / 


/" 



— 2 ^ 
In 2 \n — 1 


1 4 
In 2 n — 1 

> 0 (since n > 2) . 


Solution to Problem 7-4 

a. Quicksort' does exactly what Quicksort does; hence it sorts correctly. 

Quicksort and Quicksort' do the same partitioning, and then each calls 
itself with arguments A, p,q — 1. QUICKSORT then calls itself again, with 
arguments A, q + l,r. Quicksort' instead sets p <— q + 1 and performs 
another iteration of its while loop. This executes the same operations as calling 
itself with A, q + 1, r, because in both cases, the first and third arguments (A 
and r) have the same values as before, and p has the old value of q + 1. 

b. The stack depth of Quicksort' will be 0 (n ) on an n-element input array if 
there are © (at) recursive calls to Quicksort'. This happens if every call to 
Partition(A, p, r) returns q = r. The sequence of recursive calls in this 
scenario is 

Quicksort')A, 1, n ), 

Quicksort'(A, 1, n — 1), 

Quicksort'(A, 1, n - 2) , 

Quicksort') A, 1,1). 

Any array that is already sorted in increasing order will cause QUICKSORT to 
behave this way. 
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c. The problem demonstrated by the scenario in paid (b) is that each invocation of 
Quicksort' calls Quicksort' again with almost the same range. To avoid 
such behavior, we must change Quicksort' so that the recursive call is on a 
smaller interval of the array. The following variation of QUICKSORT checks 
which of the two subarrays returned from PARTITION is smaller and recurses 
on the smaller subarray, which is at most half the size of the current array. Since 
the array size is reduced by at least half on each recursive call, the number of 
recursive calls, and hence the stack depth, is 0(lgn) in the worst case. Note 
that this method works no matter how partitioning is performed (as long as 
the Partition procedure has the same functionality as the procedure given in 
Section 7.1). 

Quicksort"(A, p, r) 

while p < r 

do E> Partition and sort the small subarray first 
q <r- PARTITION(A, p, r) 
if q — p < r — q 
then QuiCKSORT"(A, p, q - 1) 

P <7 + 1 

else QuiCKSORT"(A, q + 1, r) 
r 4— q — 1 

The expected running time is not affected, because exactly the same work is 
done as before: the same partitions are produced, and the same subarrays are 
sorted. 
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Chapter 8 overview 

How fast can we sort? 

We will prove a lower bound, then beat it by playing a different game. 


Comparison sorting 

• The only operation that may be used to gain order information about a sequence 
is comparison of pairs of elements. 

• All sorts seen so far are comparison sorts: insertion sort, selection sort, merge 
sort, quicksort, heapsort, treesort. 


Lower bounds for sorting 
Lower bounds 

• Q. (n) to examine all the input. 

• All sorts seen so far are £2 (n lg n ). 

• We’ll show that £2 (n lg n) is a lower bound for comparison sorts. 

Decision tree 

• Abstraction of any comparison sort. 

• Represents comparisons made by 

• a specific sorting algorithm 

• on inputs of a given size. 

• Abstracts away everything else: control and data movement. 

• We’re counting only comparisons. 
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For insertion sort on 3 elements: 



[Each internal node is labeled by indices of array elements from their original 
positions. Each leaf is labeled by the permutation of orders that the algorithm 
determines.] 

How many leaves on the decision tree? There are > n\ leaves, because every 
permutation appeal's at least once. 

For any comparison sort, 

• 1 tree for each n. 

• View the tree as if the algorithm splits in two at each node, based on the infor¬ 
mation it has determined up to that point. 

• The tree models all possible execution traces. 

What is the length of the longest path from root to leaf? 

• Depends on the algorithm 

• Insertion sort: @(n 2 ) 

• Merge sort: 0(n lgn) 

Lemma 

Any binary tree of height h has < 2 /7 leaves. 

In other words: 

• 1 = # of leaves, 

• h = height, 

• Then l <2 h . 

(We’ll prove this lemma later.) 

Why is this useful? 

Theorem 

Any decision tree that sorts n elements has height (n Ig n). 
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Proof 

• / > n\ 

• By lemma, n ! < / < 2 h or 2 h > n ! 

• Take logs: h > lg (n\) 

• Use Stirling’s approximation: n\ > ( n/ef (by equation (3.16)) 
h > lg (n/ef 

= n lg (n/e) 

= n lg n — nig e 

= £2 (n lg n ) . m (theorem) 

Now to prove the lemma: 

Proof By induction on h . 

Basis: h = 0. Tree is just one node, which is a leaf, f — \. 

Inductive step: Assume true for height = h — 1. Extend tree of height h — 1 
by making as many new leaves as possible. Each leaf becomes parent to two new 
leaves. 

• of leaves for height h = 2 • (# of leaves for height h — 1) 

= 2 • 2 /,_1 (ind. hypothesis) 

= 2 h . m (lemma) 


Corollary 

Heapsort and merge sort are asymptotically optimal comparison sorts. 


Sorting in linear time 

Non-comparison sorts. 

Counting sort 

Depends on a key assumption: numbers to be sorted are integers in{0, 1 , ,k}. 

Input: A[1.. n \, where A\ j\ e {0, 1, ...,&} for j = 1,2, ..., n. Array A and 
values n and k are given as parameters. 

Output: B[l .. «], sorted. B is assumed to be already allocated and is given as a 
parameter. 

Auxiliary storage: C[0..£] 




8-4 


Lecture Notes for Chapter 8: Sorting in Linear Time 


Counting-Sort (A, B, n, k) 

for i <— 0 to k 
do C[i] 0 
for j <— I to n 

do C[A[j ]] ^ C[A[j ]] + 1 
for i <— 1 to A: 

do C[i] C\i\ + C\i - 1] 
for j <r- n downto 1 

do B[C[A[j]]] <- A\j\ 

C[A[j ]] ^ C[A[j ]] - 1 

Do an example for A = 2\, 5 1 ,3 1 , Oi, 2 2 , 3 2 , 0 2 , 3 3 

Counting sort is stable (keys with same value appeal - in same order in output as 
they did in input) because of how the last loop works. 


Analysis: 0(n + k), which is &(n) if k — O(n). 
How big a k is practical? 

• Good for sorting 32-bit values? No. 

• 16-bit? Probably not. 

• 8-bit? Maybe, depending on n. 

• 4-bit? Probably (unless n is really small). 

Counting sort will be used in radix sort. 


Radix sort 


How IBM made its money. Punch card readers for census tabulation in early 
1900’s. Card sorters, worked on one column at a time. It's the algorithm for 
using the machine that extends the technique to multi-column sorting. The human 
operator was part of the algorithm! 

Key idea: Sort least significant digits first. 

To sort d digits: 

Radix-Sort(A, d) 

for i <— I to d 

do use a stable sort to sort array A on digit i 


Example: 


326 

453 

608 

835 

751 

435 

704 

690 


- sorted - 


690 

751 

453 

704 

835 

435 

326 

608 


704 

60 

326 

835 

43p 

75 

45 

690 


453 


835 
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Correctness: 

• Induction on number of passes (i in pseudocode). 

• Assume digits 1, 2,...,/ — 1 are sorted. 

• Show that a stable sort on digit i leaves digits 1 ,,i sorted: 

• If 2 digits in position i are different, ordering by position i is correct, and 
positions 1,...,/ — 1 are irrelevant. 

• If 2 digits in position i are equal, numbers are already in the right order (by 
inductive hypothesis). The stable sort on digit i leaves them in the right 
order. 

This argument shows why it’s so important to use a stable sort for intermediate 
sort. 

Analysis: Assume that we use counting sort as the intermediate sort. 

• ©(« + k ) per pass (digits in range 0 ,,k) 

• d passes 

• ®(d(n + k)) total 

• If k = O(n), time = &(dn). 

How to break each key into digits? 

• n words. 

• b bits/word. 

• Break into r-bit digits. Have d = \b/r~\. 

• Use counting sort, k — 2 r — 1. 

Example: 32-bit words, 8-bit digits, b — 32, r — 8, d — T32/8] = 4, k — 
2 8 — 1 = 255. 

• Time = ©(£ (n + 2')). 

How to choose r? Balance b/r and n + 7. Choosing r ~ lg n gives us 
0 (iiT + ”)) = ®(bn/lgn). 

• If we choose r < lg n, then b/r > b/ lg n, and n + 7 term doesn’t improve. 

• If we choose r > lg n, then n + 7 term gets big. Example: r = 2 lg« => 
7 = 2 21g " = (2 lg ”) 2 = n 2 . 

So, to sort 2 16 32-bit numbers, use r = lg 2 16 = 16 bits. \b/r~\ = 2 passes. 
Compare radix sort to merge sort and quicksort: 

• 1 million (2 20 ) 32-bit integers. 

• Radix sort: [32/20] = 2 passes. 

• Merge sort/quicksort: lg n = 20 passes. 

• Remember, though, that each radix sort “pass” is really 2 passes—one to take 
census, and one to move data. 

How does radix sort violate the ground rules for a comparison sort? 

• Using counting sort allows us to gain information about keys by means other 
than directly comparing 2 keys. 

• Used keys as array indices. 
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Bucket sort 

Assumes the input is generated by a random process that distributes elements uni¬ 
formly over [0, 1). 

Idea: 

• Divide [0, 1) into n equal-sized buckets. 

• Distribute the n input values into the buckets. 

• Sort each bucket. 

• Then go through buckets in order, listing elements in each one. 

Input: A[1.. n |, where 0 < A\i\ < 1 for all i. 

Auxiliary array: B[0 . .n — 1] of linked lists, each list initially empty. 

Bucket-Sort(A, n) 

for / ^— 1 to n 

do insert A\i\ into list B\\ti ■ A[/]J] 
for i <— 0 to n — 1 

do sort list B \ i | with insertion sort 
concatenate lists 5[0], B[l],..., B[n — 1] together in order 
return the concatenated lists 

Correctness: Consider A[i], A\ j\. Assume without loss of generality that 
A[i] < A\j\. Then [n ■ A[/]J < \_n ■ A[y]J. So A[; | is placed into the same bucket 
as A[j ] or into a bucket with a lower index. 

• If same bucket, insertion sort fixes up. 

• If earlier bucket, concatenation of lists fixes up. 

Analysis: 

• Relies on no bucket getting too many values. 

• All lines of algorithm except insertion sorting take &(n) altogether. 

• Intuitively, if each bucket gets a constant number of elements, it takes 0(1) 
time to sort each bucket => 0(n ) sort time for all buckets. 

• We “expect” each bucket to have few elements, since the average is 1 element 
per bucket. 

• But we need to do a careful analysis. 

Define a random variable: 

• m = the number of elements placed in bucket B [; ]. 

Because insertion sort runs in quadratic time, bucket sort time is 

n— 1 

Tin) = Q(n) + ^ O(nf ) . 

!=0 
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Take expectations of both sides: 

n —1 

E \T(n)] = e' 


0 (n) + E 

i =0 

n— 1 

= 0(«) + ^E[O(«r)] 

i =0 
n— 1 

= ®(n) + J2°( E[n?]) 


i=0 


(linearity of expectation) 
(E [aX] = aE [X]) 


Claim 

E \n]\ = 2 — (1 /«) for i = 0, ..., n — 1. 
Proof of claim 

Define indicator random variables: 


• Xjj = I {A[j] falls in bucket /} 

• Pr { A[j] falls in bucket /} = l/n 

n 

• m = J2 x u 

7 = 1 


Then 

E [«?] 


2i 




L \7=1 


n—l n 


E X S +2 E e 

U=1 7 = 1 *=7+1 

« n—l n 

E e [E] + 2 E E 


7=1 


7=1 *=7 + 1 


(linearity of expectation) 


E [X?] = 0 2 • Pr { A[j] doesn’t fall in bucket /} + l 2 • Pr [A[j] falls in bucket /} 



1 

n 


E | X t j X lk | for j f k: Since j f k, X t j and X ik are independent random variables 
=> E [XijX ik ] = E | X tj | E | X ik | 

1 1 

n n 

1 


Therefore: 
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= 

n \2/n z 


= 1 + 2 


n(n — 1) 1 


= 1 + 


n — I 


= 1 + 1 - 


= 2 


Therefore: 

E [T («)] = 0 (n) + O(2 - 1 /n) 


n — 1 


i=0 

= 0 (n) + O(n) 

= 0 («) 


■ (claim) 


• Again, not a comparison sort. Used a function of key values to index into an 
array. 

• This is a probabilistic analysis—we used probability to analyze an algorithm 
whose running time depends on the distribution of inputs. 

• Different from a randomized algorithm , where we use randomization to impose 
a distribution. 

• With bucket sort, if the input isn’t drawn from a uniform distribution on [0, 1), 
all bets are off (performance-wise, but the algorithm is still correct). 



Solutions for Chapter 8: 
Sorting in Linear Time 


Solution to Exercise 8.1-3 

If the sort runs in linear time for m input permutations, then the height h of the 
portion of the decision tree consisting of the m corresponding leaves and their 
ancestors is linear. 

Use the same argument as in the proof of Theorem 8.1 to show that this is impos¬ 
sible for m = n\ /2, n\/n, or n !/2". 

We have 2 h > m, which gives us h > lg m. For all the possible m’s given here, 
lg m — Q. (n lg n), hence h = Q in lg n). 

In particular, 

= lg n ! — 1 > n lg n — n lg e — 1 

= lg n ! — lg n > n lg n — nige — lg n 

= lg n ! — n > n lg n — n lg e — n 


i nl 
8 ~2 
i n\ 

lg — 

n 

i nl 


Solution to Exercise 8.1-4 

Let 5 be a sequence of n elements divided into n/k subsequences each of length k 
where all of the elements in any subsequence are larger than all of the elements 
of a preceding subsequence and smaller than all of the elements of a succeeding 
subsequence. 

Claim 

Any comparison-based sorting algorithm to sort 5 must take U (n lg k) time in the 
worst case. 

Proof First notice that, as pointed out in the hint, we cannot prove the lower 
bound by multiplying together the lower bounds for sorting each subsequence. 
That would only prove that there is no faster algorithm that sorts the subsequences 
independently. This was not what we are asked to prove; we cannot introduce any 
extra assumptions. 
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Now, consider the decision tree of height h for any comparison sort for S. Since 
the elements of each subsequence can be in any order, any of the k ! permutations 
correspond to the final sorted order of a subsequence. And, since there are n/k such 
subsequences, each of which can be in any order, there are (k\f^ k permutations 
of S that could correspond to the sorting of some input order. Thus, any decision 
tree for sorting S must have at least (k\fk k leaves. Since a binary tree of height h 
has no more than 2 h leaves, we must have 2 h > (k'.fk k or h > lg((£ !)"/*). We 
therefore obtain 

h > lg ((k\) n/k 
= (n/k) lg(/c!) 

> (n/k)\g((k/2) k ' 2 ) 

= (n/2) lg(k/2) . 

The third line comes from k ! having its k /2 largest terms being at least k /2 each. 
(We implicitly assume here that k is even. We could adjust with floors and ceilings 
if k were odd.) 

Since there exists at least one path in any decision tree for sorting S that has length 
at least (n/2) \g(k/2), the worst-case running time of any comparison-based sorting 
algorithm for S is (n lg k). m 


Solution to Exercise 8.2-3 

[The following solution also answers Exercise 8.2-2.] 

Notice that the correctness argument in the text does not depend on the order in 
which A is processed. The algorithm is correct no matter what order is used! 

But the modified algorithm is not stable. As before, in the final for loop an element 
equal to one taken from A earlier is placed before the earlier one (i.e., at a lower 
index position) in the output arrray B. The original algorithm was stable because 
an element taken from A later started out with a lower index than one taken earlier. 
But in the modified algorithm, an element taken from A later started out with a 
higher index than one taken earlier. 

In particular - , the algorithm still places the elements with value k in positions 
C[k — 1] + 1 through C\k\, but in the reverse order of their appearance in A. 


Solution to Exercise 8.2-4 

Compute the C array as is done in counting sort. The number of integers in the 
range [«..£>] is C[b) — C[a — 1], where we interpret C[—1] as 0. 
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Solution to Exercise 8.3-2 

Insertion sort is stable. When inserting A[j ] into the sorted sequence A[1 ... j — 1], 
we do it the following way: compare A[j ] to A[i], starting with i = j — 1 and going 
down to i = 1. Continue at long as A[j] < A[i]. 

Merge sort as defined is stable, because when two elements compared are equal, the 
tie is broken by taking the element from array L which keeps them in the original 
order. 

Heapsort and quicksort are not stable. 

One scheme that makes a sorting algorithm stable is to store the index of each 
element (the element’s place in the original ordering) with the element. When 
comparing two elements, compare them by their values and break ties by their 
indices. 

Additional space requirements: For n elements, their indices are 1 .../?. Each can 
be written in lg n bits, so together they take ()(n lg n) additional space. 

Additional time requirements: The worst case is when all elements are equal. The 
asymptotic time does not change because we add a constant amount of work to 
each comparison. 


Solution to Exercise 8.3-3 

Basis: If d — 1, there’s only one digit, so sorting on that digit sorts the array. 
Inductive step: Assuming that radix sort works for cl — 1 digits, we’ll show that it 
works for d digits. 

Radix sort sorts separately on each digit, starting from digit 1. Thus, radix sort of 
d digits, which sorts on digits 1,..., d is equivalent to radix sort of the low-order 
d — 1 digits followed by a sort on digit d. By our induction hypothesis, the sort of 
the low-order d — 1 digits works, so just before the sort on digit d, the elements are 
in order according to their low-order d — I digits. 

The sort on digit d will order the elements by their d th digit. Consider two ele¬ 
ments, a and b, with c/th digits a d and b d respectively. 

• If u,i < bd, the sort will put a before b, which is correct, since a < b regardless 
of the low-order digits. 

• If ci d > b d , the sort will put a after b, which is correct, since a > b regardless 
of the low-order digits. 

• If a d = b d , the sort will leave a and b in the same order they were in, because 
it is stable. But that order is already correct, since the correct order of a and b 
is determined by the low-order d — 1 digits when their c/th digits are equal, and 
the elements are already sorted by their low-order d — 1 digits. 

If the intermediate sort were not stable, it might rearrange elements whose d th 
digits were equal—elements that were in the right order after the sort on their 
lower-order digits. 
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Solution to Exercise 8.3-4 

Treat the numbers as 2-digit numbers in radix n. Each digit ranges from 0 to n — 1. 
Sort these 2-digit numbers with radix sort. 

There are 2 calls to counting sort, each taking 0(« + n) = 0(«) time, so that the 
total time is ©(«). 


Solution to Exercise 8.4-2 

The worst-case running time for the bucket-sort algorithm occurs when the assump¬ 
tion of uniformly distributed input does not hold. If, for example, all the input ends 
up in the first bucket, then in the insertion sort phase it needs to sort all the input, 
which takes 0(n 2 ) time. 

A simple change that will preserve the linear expected running time and make the 
worst-case running time 0(n Ig n) is to use a worst-case 0(n lgn)-time algorithm 
like merge sort instead of insertion sort when sorting the buckets. 


Solution to Problem 8-1 

a. For a comparison algorithm A to sort, no two input permutations can reach the 
same leaf of the decision tree, so there must be at least n ! leaves reached in T A , 
one for each possible input permutation. Since A is a deterministic algorithm, it 
must always reach the same leaf when given a particular - permutation as input, 
so at most n ! leaves are reached (one for each permutation). Therefore exactly 
n ! leaves are reached, one for each input permutation. 

These n\ leaves will each have probability I //? !, since each of the n\ possible 
permutations is the input with the probability 1 /n\. Any remaining leaves will 
have probability 0, since they are not reached for any input. 

Without loss of generality, we can assume for the rest of this problem that paths 
leading only to 0-probability leaves aren’t in the tree, since they cannot affect 
the running time of the sort. That is, we can assume that T A consists of only the 
n\ leaves labeled I / n I and their ancestors. 

b. lfk > 1 , then the root of T is not a leaf. This implies that all of T ’s leaves 
are leaves in LT and RT. Since every leaf at depth h in LT or RT has depth 
h + 1 in T, D(T ) must be the sum of D(LT), D{RT), and k, the total number 
of leaves. To prove this last assertion, let d T (x) = depth of node x in tree T. 
Then, 

D(T) = J2 ^00 

jteleaves(r) 

= ^2 d T (x) + ^2 d T (x) 

deleaves (LT) deleaves (RT) 
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— (dnix) + 1) + (dprix) + 1) 

deleaves (LT) deleaves (RT) 

= di, T {x ) + dR T (x ) + 1 

xeleaves(L7) deleaves (RT) jteleaves(r) 

= D(LT) + D(RT) + k . 

c. To show that d(k) = mini<,<^_i { d(i ) + d(k — i) + k) we will show separately 
that 

d(k) < min {d(i) + d(k — i) + k } 
and 

d(k) > min {<:/(/') + d(k — i) + k } . 

\<i<k 1 

• To show that d(k) < mini<,<^_i {d(i) + d(k — i) + k}, we need only show 

that d(k) < d(i ) + d(k — i) + k, for i = 1, 2, ..., k — 1. For any / from 1 to 
i-lwe can find trees RT with / leaves and LT with £ — i leaves such that 
D(RT ) = d(i) and D(LT ) = — i). Construct T such that RT and LT 

are the right and left subtrees of T ’s root respectively. Then 

d(k) < D(T ) (by definition of d as min D(T) value) 

= D{RT) + D(LT) + k (by part (b)) 

= d(i) + d(k — i) + k (by choice of RT and LT) . 

• To show that d(k) > mini<,<^_i {d(i) + d(k — i) + k}, we need only show 
that d(k ) > d(i) + d(k — i) + k, for some i in {1, 2, ..., k — 1}. Take the 
tree T with k leaves such that D(T ) = d{k), let RT and LT be T 's right 
and left subtree, respecitvely, and let i be the number of leaves in RT. Then 
k — i is the number of leaves in L T and 

d(k) = D(T) (by choice of T) 

= D{RT) + D(LT) + k (by part (b)) 

> d(i) + d(k — i) + k (by defintion of d as min D(T) value) . 

Neither i nor k — i can be 0 (and hence 1 < i < k — 1), since if one of these 

were 0, either RT or LT would contain all k leaves of T, and that £-leaf 
subtree would have a D equal to L)(T ) — k (by paid (b)), contradicting the 
choice of T as the &-leaf tree with the minimum D. 


d. Let f k (i) = i lg i + ( k — i)lg(k — /). To find the value of i that minimizes /., 
find the i for which the derivative of f k with respect to i is 0: 

_ d fi Ini + (k — i)ln(k — i)\ 

k(l) ~ di V ln2 ) 

In / + 1 — ln(k — / ) — 1 
In~2 

In/ — ln(/c — /') 


In 2 

is 0 at /' = k/2. To verify this is indeed a minimum (not a maximum), check 
that the second derivative of f ; is positive at /' = k/2: 

_ d ( In i - In (k - i) \ 
fk{l) ~ di V In2 ) 
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— (- 1 ^ 

In 2 \ i k — i) 

„ 1 (2 2 \ 

^ /2) = 1^2 Un) 

1 4 

UV2 ' k 

> 0 since k > 1 . 

Now we use substitution to prove d(k ) = £2(klgk). The base case of the 
induction is satisfied because d( 1) > 0 = c ■ 1 • lg 1 for any constant c. For the 
inductive step we assume that d(i) > ci lg i for 1 < i <k — \, where c is some 
constant to be determined. 
d(k) = min [d(i) + d(k — i) + k} 

> min {c(i lg i + (k — i) \g(k — /)) + k} 

l<i<k-l 

= rnin^ {cf k (i) + k} 

= ig ( 2 ) + * 

= c(k\gk — k) + k 
= ck lg k + (k — ck) 

> cklgk if c < 1 , 
and so d(k) — Q(k lg /r). 

e. Using the result of paid (d) and the fact that T A (as modified in our solution to 
paid (a)) has n ! leaves, we can conclude that 

D(T a ) >d(n\) = Sl{n\\g{n\)) . 

D(T a ) is the sum of the decision-tree path lengths for sorting all input per¬ 
mutations, and the path lengths are proportional to the run time. Since the n ! 
permutations have equal probability 1 /n\, the expected time to sort n random 
elements (1 input permutation) is the total time for all permutations divided 
by n !: 

f2(«! lg(«!)) 

—-= n(ig(„!)) = Q(n\gn) . 

n ! 

/. We will show how to modify a randomized decision tree (algorithm) to define a 
deterministic decision tree (algorithm) that is at least as good as the randomized 
one in terms of the average number of comparisons. 

At each randomized node, pick the child with the smallest subtree (the subtree 
with the smallest average number of comparisons on a path to a leaf). Delete all 
the other children of the randomized node and splice out the randomized node 
itself. 

The deterministic algorithm corresponding to this modified tree still works, be¬ 
cause the randomized algorithm worked no matter which path was taken from 
each randomized node. 
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The average number of comparisons for the modified algorithm is no larger 
than the average number for the original randomized tree, since we discarded 
the higher-average subtrees in each case. In particular - , each time we splice out 
a randomized node, we leave the overall average less than or equal to what it 
was, because 

• the same set of input permutations reaches the modified subtree as before, but 
those inputs are handled in less than or equal to average time than before, and 

• the rest of the tree is unmodified. 

The randomized algorithm thus takes at least as much time on average as the 
corresponding deterministic one. (We’ve shown that the expected running time 
for a deterministic comparison sort is £2(n Ig /? ), hence the expected time for a 
randomized comparison sort is also £2 (n Ig n).) 


Solution to Problem 8-3 

a. The usual, unadorned radix sort algorithm will not solve this problem in the 
required time bound. The number of passes, d, would have to be the number 
of digits in the largest integer. Suppose that there are m integers; we always 
have m < n. In the worst case, we would have one integer with n/2 digits and 
n/2 integers with one digit each. We assume that the range of a single digit is 
constant. Therefore, we would have d = n/2 and m = n/2 + 1, and so the 
running time would be ®(dm ) = Q(n 2 ). 

Let us assume without loss of generality that all the integers are positive and 
have no leading zeros. (If there are negative integers or 0, deal with the positive 
numbers, negative numbers, and 0 separately.) Under this assumption, we can 
observe that integers with more digits are always greater than integers with 
fewer digits. Thus, we can first sort the integers by number of digits (using 
counting sort), and then use radix sort to sort each group of integers with the 
same length. Noting that each integer has between 1 and n digits, let n% be the 
number of integers with i digits, for i = 1,2Since there are n digits 
altogether, we have 1 ' m ‘ = n - 

It takes O(n) time to compute how many digits all the integers have and, once 
the numbers of digits have been computed, it takes 0(m + n) = 0(n) time 
to group the integers by number of digits. To sort the group with m, digits by 
radix sort takes ©(/ • m/) time. The time to sort all groups, therefore, is 

i ■ m, j 

= 0 («) . 

b. One way to solve this problem is by a radix sort from right to left. Since the 
strings have varying lengths, however, we have to pad out all strings that are 
shorter than the longest string. The padding is on the right end of the string, 
and it’s with a special character that is lexicographically less than any other 
character (e.g., in C, the character ' \ 0 ' with ASCII value 0). Of course, we 


^0(7-m,) = 


i=l 


i f=l 
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don’t have to actually change any string; if we want to know the / th character of 
a string whose length is k, then if j > k, the jth character is the pad character. 

Unfortunately, this scheme does not always run in the required time bound. 
Suppose that there are m strings and that the longest string has d characters. 
In the worst case, one string has n/2 characters and, before padding, n/2 
strings have one character each. As in part (a), we would have d = n/2 and 
m — n/2 + 1 . We still have to examine the pad characters in each pass of radix 
sort, even if we don’t actually create them in the strings. Assuming that the 
range of a single character is constant, the running time of radix sort would be 
@(dm) = 0(n 2 ). 

To solve the problem in O(n) time, we use the property that, if the first letter 
of string x is lexicographically less that the first letter of string y, then x is 
lexicographically less than y, regardless of the lengths of the two strings. We 
take advantage of this property by sorting the strings on the first letter, using 
counting sort. We take an empty string as a special case and put it first. We 
gather together all strings with the same first letter as a group. Then we recurse, 
within each group, based on each string with the first letter removed. 

The correctness of this algorithm is straightforward. Analyzing the running 
time is a bit trickier. Let us count the number of times that each string is sorted 
by a call of counting sort. Suppose that the /th string, y, has length /,. Then 
Si is sorted by at most /, + 1 counting sorts. (The “+1” is because it may have 
to be sorted as an empty string at some point; for example, ab and a end up in 
the same group in the first pass and are then ordered based on b and the empty 
string in the second pass. The string a is sorted its length, 1, time plus one more 
time.) A call of counting sort on t strings takes 0(/) time (remembering that 
the number of different characters on which we are sorting is a constant.) Thus, 
the total time for all calls of counting sort is 

( m \ / m 

I> + 1 )j = Oij^li+m 

= 0(n + m) 

= 0(n) , 

where the second line follows from Yl?=i h = n > an d the last line is because 
m < n. 


Solution to Problem 8-4 

a. Compare each red jug with each blue jug. Since there are n red jugs and n blue 
jugs, that will take 0(« 2 ) comparisons in the worst case. 

h. To solve the problem, an algorithm has to perform a series of comparisons 
until it has enough information to determine the matching. We can view the 
computation of the algorithm in terms of a decision tree. Every internal node 
is labeled with two jugs (one red, one blue) which we compare, and has three 
outgoing edges (red jug smaller, same size, or larger than the blue jug). The 
leaves are labeled with a unique matching of jugs. 
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The height of the decision tree is equal to the worst-case number of comparisons 
the algorithm has to make to determine the matching. To bound that size, let us 
first compute the number of possible matchings for n red and n blue jugs. 

If we label the red jugs from 1 to n and we also label the blue jugs from 1 

to n before starting the comparisons, every outcome of the algorithm can be 

represented as a set 

{(/, 7 r(/)) : 1 < i < n and 7T is a permutation on{l, ..., /?}} , 

which contains the pairs of red jugs (first component) and blue jugs (second 
component) that are matched up. Since every permutation n corresponds to a 
different outcome, there must be exactly n\ different results. 

Now we can bound the height h of our decision tree. Every tree with a branch¬ 
ing factor of 3 (every inner node has at most three children) has at most 3' 
leaves. Since the decison tree must have at least n ! children, it follows that 

3 h >n\> ( n/e)" =>• h > n log 3 n — n log 3 e = £2 (n lg n) . 

So any algorithm solving the problem must use Q (n lg n) comparisons. 

c. Assume that the red jugs are labeled with numbers 1.2,...,/? and so are the 
blue jugs. The numbers are arbitrary and do not correspond to the volumes of 
jugs, but are just used to refer to the jugs in the algorithm description. Moreover, 
the output of the algorithm will consist of n distinct pairs (/, j ), where the red 
jug i and the blue jug j have the same volume. 

The procedure Match-Jugs takes as input two sets representing jugs to be 
matched: R C {1, ..., /?}, representing red jugs, and B C {1, ..., /?}, rep¬ 
resenting blue jugs. We will call the procedure only with inputs that can be 
matched; one necessary condition is that |J?| = |B|. 

Match-Jugs(/?, B ) 

if 1*1=0 > Sets are empty 

then return 

if 1*1 = 1 > Sets contain just one jug each 

then let R = {r} and B = {b} 
output “(r, b)” 

return 

else r <— a randomly chosen jug in R 
compare r to every jug of B 
B < <— the set of jugs in B that are smaller than r 
B > <— the set of jugs in B that are larger than r 
b <— the one jug in B with the same size as r 
compare b to every jug of R — {/-} 

R < <— the set of jugs in R that are smaller than b 
R.. <— the set of jugs in R that are larger than b 
output “(r, b)” 

Match-Jugs (R < , B<) 

Match-Jugs(J?>, B>) 

Correctness can be seen as follows (remember that | R\ = \B\ in each call). 
Once we pick r randomly from R, there will be a matching among the jugs in 
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volume smaller than r (which are in the sets R and B<), and likewise between 

the jugs larger than r (which are in R and B .). Termination is also easy to see: 

since |/?<| + |i?>| < 17?| in every recursive step, the size of the first parameter 
reduces with every recursive call. It eventually must reach 0 or 1, in which case 
the recursion terminates. 

What about the running time? The analysis of the expected number of com¬ 
parisons is similar to that of the quicksort algorithm in Section 7.4.2. Let us 
order the jugs as r\, ..., r„ and b\,..b n where r, < r i+ \ and b, < bj + \ for 
i = I...., n, and r, = bj. Our analysis uses indicator random variables 

Xjj = I {red jug r, is compared to blue jug b/} . 

As in quicksort, a given pair r, and bj is compared at most once. When we 
compare r, to every jug in B, jug r, will not be put in either R or R > . When 
we compare b, to every jug in R — {/',}, jug b, is not put into either B or B > . 
The total number of comparisons is 

n —1 n 

1=1 j=i+l 

To calculate the expected value of A, we follow the quicksort analysis to arrive 
at 

n— 1 n 

E[X] = EE Pr {r, is compared to bj} . 

»'= i J =‘+1 

As in the quicksort analysis, once we choose a jug r k such that ij < r k < bj, we 
will put r, in R < and bj in R ,, and so r, and bj will never be compared again. 
Let us denote Rj, = {r ( ,..., rj}. Then jugs r, and bj will be compared if and 
only if the first jug in to be chosen is either r, or rj. 

Still following the quicksort analysis, until a jug from Rjj is chosen, the entire 
set Rjj is together. Any jug in Rjj is equally likely to be first one chosen. Since 
| Rjj | = j — i + 1, the probability of any given jug being the first one chosen 
in Rjj is \/{j — i +1). The remainder of the analysis is the same as the quicksort 
analysis, and we arrive at the solution of 0(n lg n) comparisons. 

Just like in quicksort, in the worst case we always choose the largest (or small¬ 
est) jug to partition the sets, which reduces the set sizes by only 1. The running 
time then obeys the recurrence T(n) = T(n — 1) + ©(«), and the number of 
comparisons we make in the worst case is T (n) = 0(n 2 ). 
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Chapter 9 overview 

• ith order statistic is the ith smallest element of a set of n elements. 

• The minimum is the first order statistic (i = 1). 

• The maximum is the nth order statistic (i = n). 

• A median is the “halfway point” of the set. 

• When n is odd, the median is unique, at i = (n + l)/2. 

• When n is even, there are two medians: 

• The lower median , at i = n/2, and 

• The upper median, at i = n/ 2 + 1. 

• We mean lower median when we use the phrase “the median.” 

The selection problem -. 

Input: A set A of n distinct numbers and a number i, with 1 < i < n. 

Output: The element x € A that is larger than exactly i — 1 other elements in A. 
In other words, the ith smallest element of A. 

The selection problem can be solved in 0(n lg n ) time. 

• Sort the numbers using an 0(n lg «)-time algorithm, such as heapsort or merge 
sort. 

• Then return the i th element in the sorted array. 

There are faster algorithms, however. 

• First, we’ll look at the problem of selecting the minimum and maximum of a 
set of elements. 

• Then, we’ll look at a simple general selection algorithm with a time bound of 
0(n) in the average case. 

• Finally, we’ll look at a more complicated general selection algorithm with a 
time bound of O(n) in the worst case. 
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Minimum and maximum 

We can easily obtain an upper bound of n — 1 comparisons for finding the minimum 
of a set of n elements. 

• Examine each element in turn and keep track of the smallest one. 

• This is the best we can do, because each element, except the minimum, must be 
compared to a smaller element at least once. 

The following pseudocode finds the minimum element in array A[ 1 .. n\: 

Minimum (A, n) 
min <— A[l] 

for i <— 2 to n 

do if min > A[/] 

then min A\i | 
return min 

The maximum can be found in exactly the same way by replacing the > with < in 
the above algorithm. 


Simultaneous minimum and maximum 

Some applications need both the minimum and maximum of a set of elements. 

• For example, a graphics program may need to scale a set of (x, y ) data to fit 
onto a rectangular display. To do so, the program must first find the minimum 
and maximum of each coordinate. 

A simple algorithm to find the minimum and maximum is to find each one indepen¬ 
dently. There will be n — 1 comparisons for the minimum and n — 1 comparisons 
for the maximum, for a total of 2 n — 2 comparisons. This will result in ©(«) time. 
In fact, at most 3 \jt /2J comparisons are needed to find both the minimum and 
maximum: 

• Maintain the minimum and maximum of elements seen so far - . 

• Don’t compare each element to the minimum and maximum separately. 

• Process elements in pairs. 

• Compare the elements of a pair to each other. 

• Then compare the larger element to the maximum so far, and compare the 
smaller element to the minimum so far - . 

This leads to only 3 comparisons for every 2 elements. 

Setting up the initial values for the min and max depends on whether n is odd or 
even. 

• If n is even, compare the first two elements and assign the larger to max and the 
smaller to min. Then process the rest of the elements in pairs. 

• If n is odd, set both min and max to the first element. Then process the rest of 
the elements in pairs. 





Lecture Notes for Chapter 9: Medians and Order Statistics 


9-3 


Analysis of the total number of comparisons 

• If n is even, we do 1 initial comparison and then 3 (n — 2)/2 more comparisons. 
# of comparisons = 


• If n is odd, we do 3 (n — l)/2 = 3 \_n/2\ comparisons. 

In either case, the maximum number of comparisons is < 3 \n/ 2J. 


3 {n - 2 ) 
2 

3 n — 6 


+ 1 


+ 1 


3 n 

T~ + 

3 n 

- 2 . 

2 


Selection in expected linear time 

Selection of the ith smallest element of the array A can be done in @(n) time. 

The function Randomized-Select uses Randomized-Partition from the 
quicksort algorithm in Chapter 7. Randomized-Select differs from quicksort 
because it recurses on one side of the partition only. 

Randomized-Select (A, p,r,i) 
if p = r 

then return A\p\ 

q < r - Randomized-Partition (A, p , r ) 
k q — p + 1 

if i = k > pivot value is the answer 

then return A [q] 
elseif i < k 

then return Randomized-Select (A, p, q - 1, i) 
else return Randomized-Select (A, q + 1, r, i -k) 

After the call to Randomized-Partition, the array is partitioned into two sub- 
arrays A[p .. q — 1] and A[q + 1.. r], along with a pivot element A[g]. 

• The elements of subarray A[p .. q — 1] are all < A[ql 

• The elements of subarray A[q + 1 .. r] are all > A[g]. 

• The pivot element is the kth element of the subarray A[p .. r], where k = 
q- P+1. 

• If the pivot element is the ith smallest element (i.e., i = k), return A\q\. 

• Otherwise, recurse on the subarray containing the i th smallest element. 

• If i < k, this subarray is A[p .. q — 1], and we want the ith smallest element. 

• If i > k, this subarray is A[q + 1 . .r] and, since there are k elements in 

A[p .. r ] that precede A[q + 1.. r], we want the (i — k)th smallest element 
of this subarray. 
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Analysis 

Worst-case running time: &(n 2 ), because we could be extremely unlucky and 
always recurse on a subarray that is only 1 element smaller than the previous sub¬ 
array. 

Expected running time: Randomized-Select works well on average. Because 
it is randomized, no particular input brings out the worst-case behavior consis¬ 
tently. 

The running time of Randomized-Select is a random variable that we denote 
by T(n). We obtain an upper bound on E [T(n) \ as follows: 

• Randomized-Partition is equally likely to return any element of A as the 
pivot. 

• For each k such that 1 < k < n, the subarray A[p .. q ] has k elements (all < 
pivot) with probability l/n. [Note that we’re now considering a subarray that 
includes the pivot, along with elements less than the pivot.] 

• For k = 1,2,..., n, define indicator random variable 

X k = I {subarray A\p .. q] has exactly k elements} . 

• Since Pr {subarray A[p . . g] has exactly k elements} = l/n, Femma 5.1 says 
that E [A*] = l/n. 

• When we call Randomized-Select, we don’t know if it will terminate im¬ 
mediately with the correct answer, recurse on A[p..q — 1], or recurse on 
A[q + 1 .. r]. It depends on whether the ith smallest element is less than, equal 
to, or greater than the pivot element A[q]. 

• To obtain an upper bound, we assume that T (n) is monotonically increasing 
and that the /th smallest element is always in the larger subarray. 

• For a given call of Randomized-Select, X k = 1 for exactly one value of k, 
and X k = 0 for all other k. 

• When X k = 1, the two subarrays have sizes k — I and n — k. 

• For a subproblem of size n, Randomized-Partition takes O(n) time. [Ac¬ 
tually, it takes 0 (n ) time, but O (n ) suffices, since we ’re obtaining only an upper 
bound on the expected running time.] 

• Therefore, we have the recurrence 

n 

Tin) < ^ X k ■ (T (max(k — 1, n — k)) + 0(n )) 

k = 1 
n 

= Xk ■ T (max(k — 1 ,n — k)) + 0(n ) . 
k = 1 

• Taking expected values gives 
E [T(n)] 

n 

< E ^ X k ■ T (max(k — 1, n — k)) + O(n) 

_k= 1 
n 

= E | X k ■ T (ma x(k — 1, n — k))\ + O(n) (lineality of expectation) 
k = t 
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= E [Xk\ • E [T (ma x(k — 1, n — £))] + O(n) (equation (C.23)) 

fc=i 

1 

- • E [T (ma x(k — 1, n — k ))] + 0(n ) . 
n 


= E 

k= 1 


• We rely on X k and T (ma \(k — l,n — k)) being independent random variables 
in order to apply equation (C.23). 

• Looking at the expression max (A: — 1, n — k), we have 


ma x(k — 1, n — k) — 


k - 1 
n — k 


if k > \n/2] , 

if k < r»/2i. 


• If n is even, each term from T (f«/2”|) up to T(n — 1) appeal's exactly twice 
in the summation. 

• If n is odd, these terms appeal' twice and T(\n /2J) appears once. 


• Either way, 

2 

E [T («)] < - V E [T(k)] + 0(n). 

n k= L«/2J 

• Solve this recurrence by substitution: 


Guess that T(n) < cn for some constant c that satisfies the initial conditions 
of the recun'ence. 

Assume that T(n) = 0(1) for n < some constant. We’ll pick this constant 
later. 

Also pick a constant a such that the function described by the 0(n) term is 
bounded from above by an for all n > 0. 

Using this guess and constants c and a, we have 

rj tl— 1 

E \T (»)] < 


n 


J2 ck + 


an 


k= L«/2J 

n — 1 


L«/2J-1 


■ an 


2c ((. n — 1)« (Ln/2J — 1) \_n/2\ 
n 


< 


< 


2 2 
2c f(n-\)n (n/2 — 2)(n/2 — 1) 

~ V 2 2 

2c / n 2 — n n 2 /A — 3n/2 + 2 
~n 2 

c / 3 n 2 n , 

- — + - - 2 ) + mi 

n \ 4 2 

(3 n 1 2' 

c — + - - - I + an 
\ 4 2 n 

3 cn c 

— -h - + an 

4 2 


+ an 

+ an 


+ an 


= cn 


/cn c \ 

—- an ) . 

V 4 2 / 
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• To complete this proof, we choose c such that 


cn/4 — c/2 — an 

> 

0 

cn/4 — an 

> 

c/2 

n(c/4 — a) 

> 

c/2 

n 

> 

c/2 

c/4 — a 

n 

> 

2c 

c — 4a 


• Thus, as long as we assume that T (, n ) = 0(1) for n < 2c/(c — 4a), we have 
E [T («)] = 0(n). 

Therefore, we can determine any order statistic in linear time on average. 


Selection in worst-case linear time 

We can find the ith smallest element in 0(n) time in the worst case. We’ll describe 
a procedure Select that does so. 

Select recursively partitions the input array. 

• Idea: Guarantee a good split when the array is partitioned. 

• Will use the deterministic procedure Partition, but with a small modifica¬ 
tion. Instead of assuming that the last element of the subarray is the pivot, the 
modified PARTITION procedure is told which element to use as the pivot. 

Select works on an array of n > 1 elements. It executes the following steps: 

1. Divide the n elements into groups of 5. Get \n/5~\ groups: \_n/5\ groups with 
exactly 5 elements and, if 5 does not divide n, one group with the remaining 
n mod 5 elements. 

2. Find the median of each of the \n/5~\ groups: 

• Run insertion sort on each group. Takes 0(1) time per group since each 
group has < 5 elements. 

• Then just pick the median from each group, in 0(1) time. 

3. Find the median x of the |>j/5] medians by a recursive call to Select. (If 
\n / 5] is even, then follow our convention and find the lower median.) 

4. Using the modified version of PARTITION that takes the pivot element as input, 
partition the input array around x. Fet x be the kth element of the array after 
partitioning, so that there are k — 1 elements on the low side of the partition and 
n — k elements on the high side. 

5. Now there are three possibilities: 

• If i — k, just return x. 

• If i < k, return the ith smallest element on the low side of the partition by 
making a recursive call to Select. 

• If i > k, return the (i — k)th smallest element on the high side of the partition 
by making a recursive call to Select. 
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Analysis 


Start by getting a lower bound on the number of elements that are greater than the 
partitioning element x : 



[Each group is a column. Each white circle is the median of a group, as found 
in step 2. Arrows go from larger elements to smaller elements, based on what we 
know after step 4. Elements in the region on the lower right are known to be greater 
than x.] 


• At least half of the medians found in step 2 are > ;c. 


• Look at the groups containing these medians that are > x. All of them con¬ 
tribute 3 elements that are > x (the median of the group and the 2 elements 
in the group greater than the group’s median), except for 2 of the groups: the 
group containing x (which has only 2 elements > x) and the group with < 5 
elements. 


• Forget about these 2 groups. That leaves > 
ments known to be > x. 


'1 

~n~ 


2 

5 



— 2 groups with 3 ele- 


• Thus, we know that at least 



3 n 

>- 6 

~ 10 


elements are > x. 


Symmetrically, the number of elements that are < x is at least 3«/10 — 6. 

Therefore, when we call Select recursively in step 5, it’s on < In /10 + 6 ele¬ 
ments. 

Develop a recurrence for the worst-case running time of Select: 

• Steps 1, 2, and 4 each take O(n) time: 

• Step 1: making groups of 5 elements takes O(n) time. 

• Step 2: sorting \n/5~\ groups in 0(1) time each. 

• Step 4: partitioning the n -element array around x takes O(n) time. 

• Step 3 takes time T (\n/ 5]). 

• Step 5 takes time < T (7/?/10 + 6), assuming that T(n) is monotonically in¬ 
creasing. 
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• Assume that T(n ) = 0(1) for small enough n. We’ll use n < 140 as “small 
enough.” Why 140? We’ll see why later. 

• Thus, we get the recurrence 

(0 ( T) if* <140, 

’ - \T(\n/5~\) + T(Jn/\0 + 6) + 0(n) if n > 140 . 

Solve this recurrence by substitution: 

• Inductive hypothesis: T(n) < cn for some constant c and all n > 0. 

• Assume that c is large enough that T (n) < cn for all n < 140. So we are 
concerned only with the case n > 140. 

• Pick a constant a such that the function described by the 0(n) term in the 
recuixence is < an for all n > 0. 

• Substitute the inductive hypothesis in the right-hand side of the recurrence: 

Tin) < c r«/5] + c(7«/10 + 6) + an 

< cn/5 + c + lcn/10 + 6c + an 
= 9cn /\0 + 1 c + an 
= cn + {—cn/ 10 + 7c + an) . 


This last quantity is 

< cn 

if 

—cn/10 + 7c + an 

< 

0 

cn/10 — 1c 

> 

an 

cn — 10c 

> 

10 an 

c(n — 70) 

> 

10 an 

c 

> 

10 a(n/(n — 70)) 


• Because we assumed that n > 140, we have n/(n — 70) < 2. 

• Thus, 20a > 10 a(n/(n —70)), so choosing c > 20 a gives c > 10a(n/(n— 70)), 
which in turn gives us the condition we need to show that T(n) < cn. 

• We conclude that T ( n ) = O(n), so that Select runs in lineal - time in all cases. 

• Why 140? We could have used any integer strictly greater than 70. 

• Observe that for n > 70, the fraction n/(n — 70) decreases as n increases. 

• We picked n > 140 so that the fraction would be < 2, which is an easy 
constant to work with. 

• We could have picked, say, n > 71, so that for all n >71, the fraction would 
be < 71/(71 — 70) = 71. Then we would have had 20a > 710a, so we’d 
have needed to choose c > 710a. 

Notice that Select and Randomized-Select determine information about the 
relative order of elements only by comparing elements. 

• Sorting requires (n lg n) time in the comparison model. 

• Sorting algorithms that run in linear time need to make assumptions about their 
input. 

• Linear-time selection algorithms do not require any assumptions about their 
input. 

• Linear-time selection algorithms solve the selection problem without sorting 
and therefore are not subject to the £2 (n lg n ) lower bound. 
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Solution to Exercise 9.1-1 

The smallest of n numbers can be found with n — 1 comparisons by conducting a 
tournament as follows: Compare all the numbers in pairs. Only the smaller of each 
pair could possibly be the smallest of all n, so the problem has been reduced to that 
of finding the smallest of \n/ 2] numbers. Compare those numbers in pairs, and so 
on, until there’s just one number left, which is the answer. 

To see that this algorithm does exactly n — 1 comparisons, notice that each number 
except the smallest loses exactly once. To show this more formally, draw a binary 
tree of the comparisons the algorithm does. The n numbers are the leaves, and each 
number that came out smaller in a comparison is the parent of the two numbers that 
were compared. Each non-leaf node of the tree represents a comparison, and there 
are n — 1 internal nodes in an n-leaf full binary tree (see Exercise (B.5-3)), so 
exactly n — 1 comparisons are made. 

In the search for the smallest number, the second smallest number must have come 
out smallest in every comparison made with it until it was eventually compared 
with the smallest. So the second smallest is among the elements that were com¬ 
pared with the smallest during the tournament. To find it, conduct another tourna¬ 
ment (as above) to find the smallest of these numbers. At most fig n | (the height 
of the tree of comparisons) elements were compared with the smallest, so finding 
the smallest of these takes fig n | — 1 comparisons in the worst case. 

The total number of comparisons made in the two tournaments was 
n — 1 + flg/il - 1 = n + rig n] - 2 
in the worst case. 


Solution to Exercise 9.3-1 

For groups of 7, the algorithm still works in linear time. The number of elements 
greater than x (and similarly, the number less than x) is at least 
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and the recurrence becomes 


Tin) < nrn/71) + T(5n/1 + 8) + Oin) , 

which can be shown to be Oin) by substitution, as for the groups of 5 case in the 
text. 


For groups of 3, however, the algorithm no longer works in linear time. The number 
of elements greater than x, and the number of elements less than x, is at least 



n 

> - -4 , 

“ 3 


and the recurrence becomes 


Tin) < nrn/31) + T(2n/3 + 4) + 0(n) , 
which does not have a linear solution. 

We can prove that the worst-case time for groups of 3 is Q(n lg n). We do so by 
deriving a recurrence for a particular case that takes Q(n lg n) time. 

In counting up the number of elements greater than x (and similarly, the num¬ 
ber less than x), consider the particular - case in which there are exactly |~| |"|]] 
groups with medians > x and in which the “leftover” group does contribute 2 
elements greater than x. Then the number of elements greater than x is exactly 
2 (Tl Till — 1) + 1 —1 discounts x’s group, as usual, and the +1 is con¬ 

tributed by x’s group) = 2\n /6~| — 1, and the recursive step for elements < x has 
n — (2 rn/61 — 1) > n — (2{n/6 + 1) — 1) = 2«/3 — 1 elements. Observe also 
that the Oin) term in the recurrence is really 0(«), since the partitioning in step 4 
takes 0(«) (not just 0{n)) time. Thus, we get the recurrence 

Tin) > rfl-n/31) + T(2n/3 - 1) + 0(«) > Tin/ 3) + T(2n/3 - 1) + 0(«) , 

from which you can show that T in) > cn lg n by substitution. You can also see 
that Tin) is nonlinear by noticing that each level of the recursion tree sums to n. 
[In fact, any odd group size > 5 works in linear time.] 


Solution to Exercise 9.3-3 

A modification to quicksort that allows it to run in Oin lg n) time in the worst case 
uses the deterministic Partition algorithm that was modified to take an element 
to partition around as an input parameter. 

Select takes an array A, the bounds p and r of the subarray in A, and the rank i 
of an order statistic, and in time linear - in the size of the subarray A[p .. r | it returns 
the ith smallest element in A[p .. r]. 

Best-Case-Quicksort (A, p, r) 
if p < r 

then i <— \_ir — p + 1)/2J 
x Select (A, p, r, i) 
q 4 - Partition (x) 

Best-Case-Quicksort (A, p, q - 1) 

Best-Case-Quicksort (A, q + 1, r) 
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For an «-element array, the largest subarray that Best-Case-Quicksort recurses 
on has n/2 elements. This situation occurs when n = r — p + 1 is even; then the 
subarray A[q + 1 .. r] has n/2 elements, and the subarray A\p .. q — 1] has n/2 — 1 
elements. 

Because Best-Case-Quicksort always recurses on subarrays that are at most 
half the size of the original array, the recurrence for the worst-case running time is 
T (n) < 2 T (n/2) + 0(n) = 0(n lg n). 


Solution to Exercise 9.3-5 

We assume that are given a procedure Median that takes as parameters an ar¬ 
ray A and subarray indices p and r, and returns the value of the median element of 
A[p .. r] in O(n) time in the worst case. 

Given Median, here is a linear-time algorithm Select' for finding the ith small¬ 
est element in A[p .. r]. This algorithm uses the deterministic Partition algo¬ 
rithm that was modified to take an element to partition around as an input parame¬ 
ter. 

Select'(A, p, r, i) 
if p = r 

then return A[p] 

v <r- MEDIANf A, p, r) 
q <T- PARTITION(v) 
k •*- q — p + 1 
if i — k 

then return A\q\ 
elseif i < k 

then return Select'(A, p, q — 1, i) 
else return Select'(A, q + 1, r, i — k) 

Because v is the median of A[p . .r], each of the subarrays A[p . .q — 1] and 
A[q + 1 .. r] has at most half the number of elements of A[p .. r]. The recurrence 
for the worst-case running time of Select' is T(n) < T(n/2) + O(n) = 0(n). 


Solution to Exercise 9.3-8 

Let’s start out by supposing that the median (the lower median, since we know we 
have an even number of elements) is in X. Let’s call the median value m, and let’s 
suppose that it’s in X[k ]. Then k elements of X are less than or equal to m and 
n — k elements of X are greater than or equal to m. We know that in the two arrays 
combined, there must be n elements less than or equal to m and n elements greater 
than or equal to m , and so there must he n — k elements of Y that are less than or 
equal to m and n — (n — k) = k elements of Y that are greater than or equal to m. 




9-12 


Solutions for Chapter 9: Medians and Order Statistics 


Thus, we can check that X\k \ is the lower median by checking whether Y [ n — k] < 
X[k] < Y[n — k + 1], A boundary case occurs for k = n. Then n — k — 0, and 
there is no array entry T[0]; we only need to check that X\n \ < Y[ 1]. 

Now, if the median is in X but is not in X\k\, then the above condition will not 
hold. If the median is in X[k '], where k' < k, then X\k\ is above the median, and 
Y[n — k + 1] < X[k ]. Conversely, if the median is in A[&"], where k" > k, then 
X[k ] is below the median, and X[k ] < Y[n — k]. 

Thus, we can use a binary search to determine whether there is an X\k\ such that 
either & < n and Y\n —k \ < X\k \ < Y[n—k+ 1] or k = n and X\k\ < Y\n—k+\ |; 
if we find such an X[&], then it is the median. Otherwise, we know that the median 
is in Y, and we use a binary search to find a F[&] such that either k < n and 
X[n — k\ < F[&] < X[n — k + 1] or k = n and Y\k] < X\n — k + 1]; such a 
Y[k] is the median. Since each binary search takes 0(\g n) time, we spend a total 
of 0(lgn) time. 

Here’s how we write the algorithm in pseudocode: 

Two-Array-Median(X, Y) 

n length[X ] [> n also equals length [ Y \ 

median <— Find-Median (A, Y, n, 1, n) 
if median = NOT-FOUND 
then median <— Find-Median(T, X, n, 1, n) 
return median 

Find-Median(A, B , n, low, high) 
if low > high 

then return NOT-FOUND 
else k l(low +high)/2j 

if k = n and A\n\ < B[l] 

then return A\n\ 

elseif k < n and B[n — k] < A\k] < B[n — k + 1] 
then return A[&] 
elseif A[k] > B[n — k + 1] 
then return Find-Median(A, B, n, low, k — 1) 
else return Find-Median(A, B, n, k + 1, high) 


Solution to Exercise 9.3-9 

In order to find the optimal placement for Professor Olay’s pipeline, we need only 
find the median(s) of the y-coordinates of his oil wells, as the following proof 
explains. 

Claim 

The optimal y-coordinate for Professor Olay’s east-west oil pipeline is as follows: 

• If n is even, then on either the oil well whose y-coordinate is the lower median 
or the one whose y-coordinate is the upper median, or anywhere between them. 

• If n is odd, then on the oil well whose y-coordinate is the median. 
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Proof We examine various cases. In each case, we will start out with the pipeline 
at a particular y-coordinate and see what happens when we move it. We’ll denote 
by s the sum of the north-south spurs with the pipeline at the starting location, 
and s' will denote the sum after moving the pipeline. 

We start with the case in which n is even. Let us start with the pipeline somewhere 
on or between the two oil wells whose y-coordinates are the lower and upper me¬ 
dians. If we move the pipeline by a vertical distance d without crossing either of 
the median wells, then n/2 of the wells become d farther from the pipeline and 
n/2 become d closer, and so s' = s + dn/2 — dn/2 = s\ thus, all locations on or 
between the two medians are equally good. 

Now suppose that the pipeline goes through the oil well whose y-coordinate is the 
upper median. What happens when we increase the y-coordinate of the pipeline 
by d > 0 units, so that it moves above the oil well that achieves the upper median? 
All oil wells whose y-coordinates are at or below the upper median become d units 
farther from the pipeline, and there are at least n/2 + 1 such oil wells (the upper 
median, and every well at or below the lower median). There are at most n/2 — 1 
oil wells whose y-coordinates are above the upper median, and each of these oil 
wells becomes at most d units closer to the pipeline when it moves up. Thus, we 
have a lower bound on s' of s' > s + d(n/2 + 1) — d(n/2 — 1) = s + 2d > s. 
We conclude that moving the pipeline up from the oil well at the upper median 
increases the total spur length. A symmetric argument shows that if we start with 
the pipeline going through the oil well whose y-coordinate is the lower median and 
move it down, then the total spur length increases. 

We see, therefore, that when n is even, an optimal placement of the pipeline is 
anywhere on or between the two medians. 

Now we consider the case when n is odd. We start with the pipeline going through 
the oil well whose y-coordinate is the median, and we consider what happens when 
we move it up by d >0 units. All oil wells at or below the median become d units 
farther from the pipeline, and there are at least (n + l)/2 such wells (the one at the 
median and the (n — l)/2 at or below the median. There are at most (n — l)/2 oil 
wells above the median, and each of these becomes at most d units closer to the 
pipeline. We get a lower bound on d of s' > s + d(n + l)/2 — d(n — l)/2 = 
s + d > s, and we conclude that moving the pipeline up from the oil well at the 
median increases the total spur length. A symmetric argument shows that moving 
the pipeline down from the median also increases the total spur length, and so the 
optimal placement of the pipeline is on the median. ■ (claim) 

Since we know we are looking for the median, we can use the linear-time median¬ 
finding algorithm. 


Solution to Problem 9-1 

We assume that the numbers start out in an array. 

a. Sort the numbers using merge sort or heapsort, which take (-)(/; lg n) worst-case 
time. (Don’t use quicksort or insertion sort, which can take ©(n 2 ) time.) Put 
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the i largest elements (directly accessible in the sorted array) into the output 
array, taking ©(/) time. 

Total worst-case running time: 0(» lg n + i ) = 0(« lg n) (because i < n). 

b. Implement the priority queue as a heap. Build the heap using Build-Heap, 
which takes @(n) time, then call Heap-Extract-Max i times to get the i 
largest elements, in 0(/ lg n) worst-case time, and store them in reverse order 
of extraction in the output array. The worst-case extraction time is 0(/ lg n) 
because 

• i extractions from a heap with O(n) elements takes i ■ 0(lg n) = 0(i lg n) 
time, and 

• half of the i extractions are from a heap with > n/2 elements, so those i /2 
extractions take (7/2)^(lg(«/2)) = Q(i lg n) time in the worst case. 

Total worst-case running time: 0(« + i lg n). 

c. Use the Select algorithm of Section 9.3 to find the ith largest number in @(n) 
time. Partition around that number in 0(«) time. Sort the i largest numbers in 
0(i lg i) worst-case time (with merge sort or heapsort). 

Total worst-case running time: 0(« + i lg;). 

Note that method (c) is always asymptotically at least as good as the other two 
methods, and that method (b) is asymptotically at least as good as (a). (Com¬ 
paring (c) to (b) is easy, but it is less obvious how to compare (c) and (b) to (a), 
(c) and (b) are asymptotically at least as good as (a) because n, i lg i, and i lg n are 
all 0(n lg n). The sum of two things that are 0(n lg n) is also 0(n lg /? ).) 


Solution to Problem 9-2 


a. The median x of the elements x \, xi ,..., x n , is an element v = ..iy satisfying 
\{xi : 1 <i<n and x ( - < x}| < n/2 and |{x, :!</'< n and x, > x}| < n/2. 
If each element x,- is assigned a weight in,- = \/n, then we get 


E 


U!j = 


E 1 

^ n 

Xi<X 

1 -E 1 

17 C —/ 


1 

- • I {xi : 1 < i < n and x,- < x}| 
n 

1 n 
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E-« = Er 




Solutions for Chapter 9: Medians and Order Statistics 


9-15 


< 



1 

- • \{xi : 1 < I < n andx, > x}| 
n 

1 n 

n 2 

1 

2 ’ 


which proves that x is also the weighted median of x\, X 2 , . . ., x„ with weights 
Wj — 1/ n, for i = 1 , 2 ,..., n. 


b. We first sort the n elements into increasing order by x,- values. Then we scan 
the array of sorted x, ’s, starting with the smallest element and accumulating 
weights as we scan, until the total exceeds 1/2. The last element, say x^, whose 
weight caused the total to exceed 1 /2, is the weighted median. Notice that the 
total weight of all elements smaller than x* is less than 1 /2, because x*. was 
the first element that caused the total weight to exceed 1/2. Similarly, the total 
weight of all elements larger than xy is also less than 1/2, because the total 
weight of all the other elements exceeds 1 /2. 

The sorting phase can be done in 0{n Ig n) worst-case time (using merge sort 
or heapsort), and the scanning phase takes 0(n) time. The total running time 
in the worst case, therefore, is ()(n lg n). 


c. We find the weighted median in ©(«) worst-case time using the (-)(/; ) worst- 
case median algorithm in Section 9.3. (Although the first paragraph of the 
section only claims an O(n) upper bound, it is easy to see that the more precise 
running time of ©(«) applies as well, since steps 1,2, and 4 of Select actually 
take ©(«) time.) 

The weighted-median algorithm works as follows. If n < 2, we just return 
the brute-force solution. Otherwise, we proceed as follows. We find the actual 
median x,t of the n elements and then partition around it. We then compute the 
total weights of the two halves. If the weights of the two halves are each strictly 
less than 1 /2, then the weighted median is ,.\y. Otherwise, the weighted median 
should be in the half with total weight exceeding 1 /2. The total weight of the 
“light” half is lumped into the weight of x^, and the search continues within the 
half that weighs more than 1/2. Here’s pseudocode, which takes as input a set 
X = {x i, x 2 , ... ,x„}: 
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Weighted-Median (X) 
if n = 1 

then return x\ 
elseif n = 2 
then if w i > w 2 

then return x\ 
else return *2 

else 

find the median x k of X — {x \, x 2 ,..., x n ] 
partition the set X around x k 
compute W L = J2 Xi<Xk u >i and W G = J2 Xi > Xk w i 
if W L < 1/2 and W G < 1/2 
then return x k 
elseif Wl > 1/2 
then w k <r- w k + W G 

X' {Xi e X : Xi < x k } 
return Weighted-Median (X r ) 
else w k Wk + Wp 

X' {Xi e X : Xi > Xk] 
return Weighted-Median (X r ) 

The recurrence for the worst-case running time of Weighted-Median is 
T (n) = T (n/2 + 1) + 0 (n), since there is at most one recursive call on half the 
number of elements, plus the median element x k , and all the work preceding the 
recursive call takes 0(n) time. The solution of the recurrence is T(n) = (-)(/; ). 

d. Let the n points be denoted by their coordinates x\, X 2 ,..., x n , let the corre¬ 
sponding weights be Wi, W 2 ,..., w n , and let x = x k be the weighted median. 
For any point p, let f(p) = i w ‘ I P ~ t |; we want to find a point p such 
that f(p) is minimum. Let y be any point (real number) other than x. We show 
the optimality of the weighted median x by showing that f(y ) — /(x) > 0. We 
examine separately the cases in which y > x and x > y. For any x and y, we 
have 

n n 

f(y)-f(x) = J2 w i\y-Xi\-J2wi\x-Xi\ 

i= 1 i= 1 

n 

= TD/CI3 7 — 1 — |A' — W-1) - 

(=1 

When y > x, we bound the quantity \y — x,-1 — |x — x,-1 from below by exam¬ 
ining three cases: 

Lx < y < Xj\ Here, |x — y\ + |y — x,| = |x — x,| and |x — y\ = y — x, which 
imply that |y - x t \ - |x - x t \ = - |x - y| = x - y. 

2. x < X,- < y: Here, |y — x,| > 0 and |x, — x| < y — x, which imply that 

ly - xi | — |x — x/1 > — (y — x) = x - y. 

3. x, < x < y: Here, |x — x,| + |y — x| = |y — x, | and |y — x| = y — x, which 

imply that |y - x,| - |x - x t \ = |y - x| = y - x. 
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Separating out the first two cases, in which x < x,, from the third case, in which 
x > Xj , we get 

n 

f(y ) - /(*) = ^ Wj(\y - Xi | - \x - Xi\) 

i= 1 

> ^Wi(x - y)+ ^Wi(y - x) 

X < Xi X >Xi 

= (y -x) I ^ wi - ^ Wi 

\x >Xi X <Xi 

The property that ff x . <x vy <1/2 implies that ff x>Xi vy > 1/2. This fact, 
combined with y — x > 0 and ff x<x . ug < 1/2, yields that f(y) — fix ) > 0. 

When x > y, we again bound the quantity |y — x,-| — |jc — jc,-| from below by 
examining three cases: 

1 ■ Xj < y < x: Here, |y — x,- | + |x — y| = \x — x,| and \x — y\ — x — y, which 
imply that \y - x t \ - \x - x t \ = - \x - y\ = y - x. 

2. y < Xj < x: Here, |y — x, | > 0 and \x — x t \ < x — y, which imply that 
\y - Xi \-\x-Xi\ >-(x-y) = y- x. 

3 . y < x < Xj. Here, \x — y| + \x — x,| = \y — x,| and \x — y\ — x — y, which 
imply that \y - x t \ - \x - x t \ = \x - y\ = x - y. 

Separating out the first two cases, in which x > x ,, from the third case, in which 
x < Xi, we get 

n 

fiy) - fix) = ^ wii\y - Xi\ - |x - x/l) 

i= 1 

> ^ Wtiy - x) + ^ wfx - y) 

X>Xi X<Xi 

= ix - y) I ^2 Wi - ^ Wi 

\X<Xi X> Xi 

The property that J2 Xi > x w > — 1/2 implies that J2 x<Xi u, > > 1/2- This fact, 
combined with x — y > 0 and ff x>x . Wj < 1/2, yields that fiy) — fix) > 0. 

e. We are given n 2-dimensional points p\, p 2 , ■ ■ ■, p„, where each /;, is a pair of 
real numbers p, = (x,-, y, ), and positive weights w\, w 2 , .. ■, w n . The goal is 
to find a point p = (x, y) that minimizes the sum 

n 

fix , y) = ^2 Wi i\x - Xi | + \y - yi |) . 

i=i 

We can express the cost function of the two variables, fix, y), as the sum of 
two functions of one variable each: /(x,y) = g(x) + hiy), where g(x) = 
E'Li w i \x - Xi |, and hiy) = E"=i w ‘ IT “ Til- The goal of finding a point 
p = ix, y) that minimizes the value of fix, y) can be achieved by treating 
each dimension independently, because g does not depend on y and h does not 
depend on x. Thus, 

min fix, y) = min (gfx) + hiy)) 
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= mm 


= mm 


(min(g(x) + h(y))^j 
(#(*) + min h (y)) 


= ming(x) + min/7(y) . 

* y 

Consequently, finding the best location in 2 dimensions can be done by finding 
the weighted median Xk of the x-coordinates and then finding the weighted 
median yj of the y-coordinates. The point (x*, yj) is an optimal solution for 
the 2-dimensional post-office location problem. 


Solution to Problem 9-3 

a. Our algorithm relies on a particular property of Select: that not only does it 
return the /th smallest element, but that it also partitions the input array so that 
the first i positions contain the / smallest elements (though not necessarily in 
sorted order). To see that Select has this property, observe that there are only 
two ways in which returns a value: when n — 1, and when immediately after 
partitioning in step 4, it finds that there are exactly i elements on the low side 
of the partition. 

Taking the hint from the book, here is our modified algorithm to select the / th 
smallest element of n elements. Whenever it is called with i > n/ 2, it just calls 
Select and returns its result; in this case, Uj(n) = T (n). 

When i < n/2, our modified algorithm works as follows. Assume that the input 
is in a subarray A[p + 1.. p + «], and let m = \_n/2 \. In the initial call, p — 1. 

1. Divide the input as follows. If n is even, divide the input into two parts: 
A[p + 1.. p + m\ and A[p + m + 1.. p + n\. If n is odd, divide the input 
into three parts: A[p + 1.. p +m], A[p + m + 1 .. p + n — 1], and A[p + n] 
as a leftover piece. 

2. Compare A[p + /] and A[p + i +m\ for i = 1.2,..., m, putting the smaller 
of the the two elements into A[p + / + m\ and the larger into A[p + /]. 

3. Recursively find the / th smallest element in A[p+m +1 ■ ■ p+n\, but with an 
additional action performed by the partitioning procedure: whenever it ex¬ 
changes /l|y| and A\k\ (where p+m +1 < j,k < p+2m), it also exchanges 
A[j —m\ and A[k—m]. The idea is that after recursively finding the / th small¬ 
est element in A[p + m +1 .. p + n], the subaiTay A[p + m + 1 .. p + m +/] 
contains the / smallest elements that had been in A[p + m + 1.. p + n\ and 
the subarray A\p + 1../? + /] contains their larger counterparts, as found in 
step 1. The / th smallest element of A[p + 1 .. p + n\ must be either one of 
the / smallest, as placed into A[p + m + 1.. p + m + /], or it must be one of 
the larger counterparts, as placed into A\p + 1 ../? + /]. 

4. Collect the subaiTays A[p + 1 ../? + /] and A[p + m + l..p + m + /] into 
a single array 611 .. 2/], call Select to find the /th smallest element of B, 
and return the result of this call to Select. 

The number of comparisons in each step is as follows: 
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1. No comparisons. 

2. m = |_«/2J comparisons. 

3. Since we recurse on A[p + m + 1.. p + «], which has \n/2~\ elements, the 
number of comparisons is t/, (T«/2~|). 

4. Since we call Select on an array with 2 i elements, the number of compar¬ 
isons is T (2/). 

Thus, when i < n/2, the total number of comparisons is [n /2J + U, (\n /2]) + 
T(2i). 


b. We show by substitution that if i < n /2, then U, in) = n + 0(T (2 i ) lg (n //)). In 
particular - , we shall show that U,(n) < n + cT(2i) lg(/i/ i) — d (lg lg n)T(2i) = 
n + cT(2i) lg n — cT(2i)lgi — d tig lg n)T(2i) for some positive constant c, 
some positive constant d to be chosen later, and n > 4. We have 
Ui(n) = \n/2\ + Uj([n/2~}) + T(2i) 

< L«/2J + rn/21 + cT (2 i) lg \n/2~\ - cT (2 i) lg i 

~d (lglg \n/2~\)T(2i) 

= n + cT(2i)lg[n/2] -cT(2i)\gi - d(\g\g \n/2 1 \)T{2i) 

< n + cT (2/)lg(n/2 + 1) - cT(2i)\gi - d(\glg(n/2))T(2i) 

= n + cT (2/) lg(«/2 + 1) — cT (21) lg i - d(lg(lgn - l))T(2i) 

< n + cT(2i) Ign— cT(2i) lg i — d(lglgn)T(2i) 

if cT (2i) lg(n/2 + 1) — c? (lg (lg « - l))r(2/) < cT(2i)lgn - d (lglg n)T (2i). 
Simple algebraic manipulations gives the following sequence of equivalent con¬ 
ditions: 


cT(2i)lg(n/2 + 1) - d(lg(lgn - 1))T(2/) < cT(2i)lgn 
clg(n/2 + 1) — c? (lg(lg n - 1)) < c lg n - d(lglgn) 
c(lg(«/2 + 1) - Ign) < c? (lg(lg n - 1) - lglgn) 


d (lg lg n)T(2i) 


c I lg 

V 

c(lg 


n/2 + 1 


1 1 
2 + n 


<d lg 


<^lg 


lg n - 1 
lg n 

Ign-' 


Ign 


Observe that 1/2+1 /n decreases as n increases, but (lg n — 1)/ lg n increases as 
n increases. When n = 4, we have 1/2+1 /n = 3/4 and (lg n — 1)/ lg n = 1 /2. 
Thus, we just need to choose d such that c lg(3/4) < d lg(l/2) or, equivalently, 
clg(3/4) < —d. Multiplying both sides by —1, we get d < —clg(3/4) = 
c lg(4/3). Thus, any value of d that is at most c lg(4/3) suffices. 


c. When i is a constant, T (2 i) = 0(1) and lg (n/i) = lg n —lg i = 0(lg n). Thus, 
when i is a constant less than n /2, we have that 

Ui(n) = n + 0(T(2i)lg(n/i)) 

= n +0(0(1) ■ O(Ign)) 

= n + 0(lgn). 

d. Suppose that i = n / k for k >2. Then i < n/2. If k > 2, then i < n/2, and we 
have 

Ui(n) = 


n + 0(T(2i)lg(n/i)) 
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= n + 0(T(2n/k)lg(n/(n/k )) 

= n+ 0(T(2n/k)lgk) . 

If k = 2, then n = 2i and lg k — 1. We have 

U,(n) = T{n) 

= n + (T {n) — n ) 

< n + iT i2i) — n) 

= n + iT {2n / k) — n) 

= n + iT (2 n/k) lg k — n) 

= n+ 0(T{2n/k)\gk) . 
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Chapter 11 overview 

Many applications require a dynamic set that supports only the dictionary opera¬ 
tions Insert, Search, and Delete. Example: a symbol table in a compiler. 

A hash table is effective for implementing a dictionary. 

• The expected time to search for an element in a hash table is 0(1), under some 
reasonable assumptions. 

• Worst-case search time is ©(«), however. 

A hash table is a generalization of an ordinary array. 

• With an ordinary array, we store the element whose key is k in position k of the 
array. 

• Given a key k, we find the element whose key is k by just looking in the kth 
position of the array. This is called direct addressing. 

• Direct addressing is applicable when we can afford to allocate an array with 
one position for every possible key. 

We use a hash table when we do not want to (or cannot) allocate an array with one 

position per possible key. 

• Use a hash table when the number of keys actually stored is small relative to 
the number of possible keys. 

• A hash table is an array, but it typically uses a size proportional to the number 
of keys to be stored (rather than the number of possible keys). 

• Given a key k, don’t just use k as the index into the array. 

• Instead, compute a function of k, and use that value to index into the array. We 
call this function a hash function. 

Issues that we’ll explore in hash tables: 

• How to compute hash functions. We’ll look at the multiplication and division 
methods. 

• What to do when the hash function maps multiple keys to the same table entry. 
We’ll look at chaining and open addressing. 
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Direct-address tables 

Scenario: 

• Maintain a dynamic set. 

• Each element has a key drawn from a universe U = {0, I..... m — I} where 
m isn’t too large. 

• No two elements have the same key. 

Represent by a direct-address table , or array, T[0 ... m — 1]: 

• Each slot, or position, corresponds to a key in U. 

• If there’s an element x with key k, then T\k\ contains a pointer to x. 

• Otherwise, T[k] is empty, represented by NIL. 


T 



Dictionary operations are trivial and take 0(1) time each: 

Direct-Address-Search (T, k) 

return T[k ] 

Direct-Address-Insert (T, x) 

T\key[xf\ x 

Direct-Address-Delete (T, x) 

T\key[xf\ <— NIL 


Hash tables 


The problem with direct addressing is if the universe U is large, storing a table of 
size \U\ may be impractical or impossible. 

Often, the set K of keys actually stored is small, compared to U, so that most of 
the space allocated for T is wasted. 






Lecture Notes for Chapter 11: Hash Tables 


11-3 


• When K is much smaller than U, a hash table requires much less space than a 
direct-address table. 

• Can reduce storage requirements to 0 (\K\ ). 

• Can still get 0(1) search time, but in the average case, not the worst case. 

Idea: Instead of storing an element with key k in slot k, use a function h and store 
the element in slot h (k). 

• We call h a hash function. 

• h : U —> {0, 1, ..., m — 1}, so that h(k) is a legal slot number in T. 

• We say that k hashes to slot h(k). 

Collisions: When two or more keys hash to the same slot. 

• Can happen when there are more possible keys than slots (|f/| > m). 

• For a given set K of keys with \K\ < m, may or may not happen. Definitely 
happens if | A' | > m. 

• Therefore, must be prepared to handle collisions in all cases. 

• Use two methods: chaining and open addressing. 

• Chaining is usually better than open addressing. We’ll examine both. 

Collision resolution by chaining 

Put all elements that hash to the same slot into a linked list. 


T 



[This figure shows singly linked lists. If we want to delete elements, it’s better to 
use doubly linked lists.] 

• Slot j contains a pointer to the head of the list of all stored elements that hash 
to j [or to the sentinel if using a circular, doubly linked list with a sentinel] , 

• If there are no such elements, slot j contains NIL. 
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How to implement dictionary operations with chaining: 

• Insertion: 

Chained-Hash-Insert (T, x) 
insert x at the head of list T\h(key\x ]) | 

• Worst-case running time is 0(1). 

• Assumes that the element being inserted isn’t already in the list. 

• It would take an additional search to check if it was already inserted. 

• Search: 

Chained-Hash-Search(7\ k) 

search for an element with key k in list T\h(k)\ 

Running time is proportional to the length of the list of elements in slot h(k). 

• Deletion: 

Chained-Hash-Delete (T, x) 
delete x from the list T\h(key\x\)\ 

• Given pointer a to the element to delete, so no search is needed to find this 
element. 

• Worst-case running time is 0(1) time if the lists are doubly linked. 

• If the lists are singly linked, then deletion takes as long as searching, be¬ 
cause we must find x’s predecessor in its list in order to correctly update next 
pointers. 

Analysis of hashing with chaining 

Given a key, how long does it take to find an element with that key, or to determine 
that there is no element with that key? 

• Analysis is in terms of the load factor a = n/m: 

• n — # of elements in the table. 

• m, = # of slots in the table = # of (possibly empty) linked lists. 

• Load factor is average number of elements per linked list. 

• Can have a < 1, a — 1, or a > 1. 

• Worst case is when all n keys hash to the same slot get a single list of length n 
=>• worst-case time to search is 0(«), plus time to compute hash function. 

• Average case depends on how well the hash function distributes the keys among 
the slots. 

We focus on average-case performance of hashing with chaining. 

• Assume simple uniform hashing-, any given element is equally likely to hash 
into any of the m slots. 



Lecture Notes for Chapter 11: Hash Tables 


11-5 


• For j = 0,1,..., m — 1, denote the length of list T\j\ by nj. Then 

n — n 0 + n\ 4-+ n m -\. 

• Average value of n } is E [ n ; ] = a = n/m. 

• Assume that we can compute the hash function in 0(1) time, so that the time 
required to search for the element with key k depends on the length ry,^ of the 
list T\h(k)\. 

We consider two cases: 

• If the hash table contains no element with key k, then the search is unsuccessful. 

• If the hash table does contain an element with key k, then the search is success¬ 
ful. 

[In the theorem statements that follow, we omit the assumptions that we’re resolv¬ 
ing collisions by chaining and that simple uniform hashing applies.] 

Unsuccessful search: 

Theorem 

An unsuccessful search takes expected time 0(1 + or). 

Proof Simple uniform hashing => any key not already in the table is equally likely 
to hash to any of the m slots. 

To search unsuccessfully for any key k, need to search to the end of the list T\h(k)\. 
This list has expected length E [«*(*■>] = ot. Therefore, the expected number of 
elements examined in an unsuccessful search is a. 

Adding in the time to compute the hash function, the total time required is 

0(1 + a). m 

Successful search: 

• The expected time for a successful search is also 0(1 + a). 

• The circumstances are slightly different from an unsuccessful search. 

• The probability that each list is searched is proportional to the number of ele¬ 
ments it contains. 

Theorem 

A successful search takes expected time 0(1 + a). 

Proof Assume that the element x being searched for is equally likely to be any of 
the n elements stored in the table. 

The number of elements examined during a successful search for x is I more than 
the number of elements that appear before x in x’s list. These are the elements 
inserted after x was inserted (because we insert at the head of the list). 

So we need to find the average, over the n elements x in the table, of how many 
elements were inserted into x’s list after x was inserted. 

For i = 1,2. n, let x, be the ith element inserted into the table, and let 

ki = key[xi]. 
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For all i and j, define indicator random variable X t j = I {h(kj) = h(kj)}. 

Simple uniform hashing => Pr {h.(kj) = h(kj)} = 1/m => E [ A; ; \ = 1/m (by 
Lemma 5.1). 


Expected number of elements examined in a successful search is 

, n / n \ 

^E(‘ + E x ‘i 

i'=l \ j=i +1 ) 

| « / « \ 

= — I 1 + E [ Xij | 1 (lineality of expectation) 
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(equation (A.l)) 


Adding in the time for computing the hash function, we get that the expected total 
time for a successful search is 0(2 + a/2 — a/2 n) = 0(1 + a). 


Alternative analysis, using indicator random variables even more: 

For each slot l and for each pair of keys k, and kj, define the indicator random 
valuable A,,; = I {the search is for x, , hik,) = l, and h(kj) = /}. A (/ / = 1 when 
keys ki and kj collide at slot / and when we are searching for x t . 

Simple uniform hashing Pr{/? (k,) = 1} = 1/m and Pr {h(kj) = 1} = 1/m. 
Also have Pr {the search is for x, } = I /n. These events are all independent => 
Pr {Xjji = 1} = 1 /nm 2 ^ E [X;y/] = 1 /nm 2 (by Lemma 5.1). 

Define, for each element xj, the indicator random valuable 

Yj = I {xj appears in a list prior to the element being searched lot} . 

Yj = 1 if and only if there is some slot / that has both elements and xj in its list, 
and also i < j (so that x, appeal's after xj in the list). Therefore, 

j— 1 m—1 

> EE x ‘i- ■ 

i= 1 /=o 

One final random valuable: Z, which counts how many elements appear in the list 
prior to the element being searched for: Z = Y^j=\ Yj- We must count the element 
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being searched for as well as all those preceding it in its list => compute E[Z + 1]: 
E [Z + 1] = E 


(linearity of expectation) 


i+E'j 

j =i . 

n j— 1 m—1 

= 1+E EES><' 
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n j —1 m—1 

= 1 + E [ Xjji | (linearity of expectation) 
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n(n — 1) 1 
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1 n 1 
2m 2m 
a a 

= ' + 2~2l,' 

Adding in the time for computing the hash function, we get that the expected total 
time for a successful search is 0(2 + a/2 — a/2n) = 0(1 + a). ■ 


Interpretation: If n = 0{m), then a = n/m = 0(m)/m = 0(1), which means 
that searching takes constant time on average. 

Since insertion takes 0(1) worst-case time and deletion takes 0(1) worst-case 
time when the lists are doubly linked, all dictionary operations take 0(1) time on 
average. 


Hash functions 

We discuss some issues regarding hash-function design and present schemes for 
hash function creation. 


What makes a good hash function? 

• Ideally, the hash function satisfies the assumption of simple uniform hashing. 

• In practice, it’s not possible to satisfy this assumption, since we don’t know in 
advance the probability distribution that keys are drawn from, and the keys may 
not be drawn independently. 

• Often use heuristics, based on the domain of the keys, to create a hash function 
that performs well. 
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Keys as natural numbers 

• Hash functions assume that the keys are natural numbers. 

• When they’re not, have to interpret them as natural numbers. 

• Example: Interpret a character string as an integer expressed in some radix 
notation. Suppose the string is CLRS: 

• ASCII values: C = 67, L = 76, R = 82, S = 83. 

• There are 128 basic ASCII values. 

• So interpret CLRS as (67 • 128 3 ) + (76 • 128 2 ) + (82 • 128 1 ) + (83 • 128°) = 
141,764,947. 

Division method 

h (k) = k mod m . 

Example: m = 20 and k — 9\ => h(k) = II. 

Advantage: Fast, since requires just one division operation. 

Disadvantage: Have to avoid certain values of m : 

• Powers of 2 are bad. If m = 2 P for integer p, then h (k) is just the least signifi¬ 
cant p bits of k. 

• If k is a character string interpreted in radix 2 P (as in CLRS example), then 
m — 2 P — 1 is bad: permuting characters in a string does not change its hash 
value (Exercise 11.3-3). 

Good choice for m: A prime not too close to an exact power of 2. 

Multiplication method 

1. Choose constant A in the range 0 < A < 1. 

2. Multiply key k by A. 

3. Extract the fractional part of kA. 

4. Multiply the fractional part by m. 

5. Take the floor of the result. 

Put another way, h(k) = \_m (k A mod 1)J, where k A mod 1 = kA — \kA\ = 
fractional part of kA. 

Disadvantage: Slower than division method. 

Advantage: Value of m is not critical. 

(Relatively) easy implementation: 

• Choose m = 2 P for some integer p. 

• Let the word size of the machine be w bits. 

• Assume that k fits into a single word. ( k takes w bits.) 

• Let s be an integer in the range 0 < s < 2" ! . A takes w bits.) 
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• Restrict A to be of the form s/2 w . 

us bits 



• Multiply k by s. 

• Since we’re multiplying two in-bit words, the result is 2 w bits, r t 2 1,1 +r 0 , where 
r i is the high-order word of the product and r 0 is the low-order word. 

• r\ holds the integer paid of kA (Ik A]) and r 0 holds the fractional part of kA 
(k A mod 1 = kA — \_kA\). Think of the “binary point” (analog of decimal 
point, but for binary representation) as being between r\ and r 0 . Since we don’t 
care about the integer paid of kA, we can forget about t\ and just use r 0 . 

• Since we want [m (k A mod 1)J, we could get that value by shifting r 0 to the 
left by p = lg m bits and then taking the p bits that were shifted to the left of 
the binary point. 

• We don’t need to shift. The p bits that would have been shifted to the left of the 
binary point are the p most significant bits of r 0 . So we can just take these bits 
after having formed r 0 by multiplying k by s. 

• Example: m = 8 (implies p = 3), w = 5, k = 21. Must have 0 < s < 2 s ; 
choose s = 13 =£• A = 13/32. 

• Using just the formula to compute h(k): kA = 21 • 13/32 = 273/32 = 8^ 
=>• k A mod 1 = 17/32 =>• m (k A mod 1) = 8 • 17/32 = 17/4 = 4\ =>• 
Lm (k A mod 1)J = 4, so that h(k) = 4. 

• Using the implementation: ks — 21 ■ 13 = 273 = 8 • 2 s + 17 r\ = 8, 
r 0 = 17. Written in w = 5 bits, r 0 = 10001. Take the p = 3 most significant 
bits of r 0 , get 100 in binary, or 4 in decimal, so that h (k) = 4. 

How to choose A: 

• The multiplication method works with any legal value of A. 

• But it works better with some values than with others, depending on the keys 
being hashed. 

• Knuth suggests using A ~ (V5 — l)/2. 


Universal hashing 

[We just touch on universal hashing in these notes. See the book for a full treat¬ 
ment.] 

Suppose that a malicious adversary, who gets to choose the keys to be hashed, has 
seen your hashing program and knows the hash function in advance. Then he could 
choose keys that all hash to the same slot, giving worst-case behavior. 
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One way to defeat the adversary is to use a different hash function each time. You 
choose one at random at the beginning of your program. Unless the adversary 
knows how you’ll be randomly choosing which hash function to use, he cannot 
intentionally defeat you. 

Just because we choose a hash function randomly, that doesn’t mean it’s a good 
hash function. What we want is to randomly choose a single hash function from a 
set of good candidates. 

Consider a finite collection M of hash functions that map a universe U of keys into 
the range {0,1 ,,m — 1}. M is universal if for each pair of keys k, l e U, where 
k ^ l, the number of hash functions h e M for which h (k) = h(l) is < \M\ /m. 

Put another way, M is universal if, with a hash function h chosen randomly 
from M, the probability of a collision between two different keys is no more than 
than 1/m chance of just choosing two slots randomly and independently. 

Why are universal hash functions good? 

• They give good hashing behavior: 

Theorem 

Using chaining and universal hashing on key k\ 

• If /c is not in the table, the expected length E[«/,(£)] of the list that k hashes 
to is < a. 

• If k is in the table, the expected length E[«/,(*■>] of the list that holds k is 
< 1 + a. 

Corollary 

Using chaining and universal hashing, the expected time for each Search op¬ 
eration is 0(1). 

• They are easy to design. 

[See book for details of behavior and design of a universal class of hash functions.] 


Open addressing 

An alternative to chaining for handling collisions. 

Idea: 

• Store all keys in the hash table itself. 

• Each slot contains either a key or NIL. 

• To search for key k: 

• Compute h(k) and examine slot h(k). Examining a slot is known as a probe. 

• If slot h(k) contains key k, the search is successful. If this slot contains NIL, 
the search is unsuccessful. 

• There’s a third possibility: slot h(k ) contains a key that is not k. We compute 
the index of some other slot, based on k and on which probe (count from 0: 
Oth, 1st, 2nd, etc.) we’re on. 
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• Keep probing until we either find key k (successful search) or we find a slot 
holding NIL (unsuccessful search). 

• We need the sequence of slots probed to be a permutation of the slot numbers 
(0, 1, ..., m — 1} (so that we examine all slots if we have to, and so that we 
don’t examine any slot more than once). 

• Thus, the hash function is h : U x {0, 1,..., m — 1} —> {0, 1, ..., m — 1}. 

'-V-' '-V-' 

probe number slot number 

• The requirement that the sequence of slots be a permutation of (0, 1, ..., 
m — 1) is equivalent to requiring that the probe sequence (h(k, 0), h(k, 1), 
..., h(k, m — 1)) be a permutation of (0, 1, ..., m — 1). 

• To insert, act as though we’re searching, and insert at the first NIL slot we find. 

Pseudocode for searching: 

Hash-Search(T, k) 
i 4-0 

repeat j h(k, i) 
if T[j] = k 

then return j 

i 4— i + 1 

until T[j] = NIL or i = m 

return NIL 

Hash-Search returns the index of a slot containing key k, or nil if the search is 
unsuccessful. 

Pseudocode for insertion: 

Hash-Insert (T, k) 

i 4-0 

repeat / 4- h(k, i) 
if T[j] — NIL 
then T[j] 4- k 
return j 
else i 4— i + 1 
until i = m 

error “hash table overflow” 

Hash-Insert returns the number of the slot that gets key k, or it flags a “hash 
table overflow” error if there is no empty slot in which to put key k. 

Deletion: Cannot just put NIL into the slot containing the key we want to delete. 

• Suppose we want to delete key k in slot j. 

• And suppose that sometime after inserting key k, we were inserting key U, and 
during this insertion we had probed slot j (which contained key k). 

• And suppose we then deleted key k by storing NIL into slot j. 

• And then we search for key k!. 
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• During the search, we would probe slot j before probing the slot into which 
key k' was eventually stored. 

• Thus, the search would be unsuccessful, even though key K is in the table. 

Solution: Use a special value DELETED instead of NIL when marking a slot as 

empty during deletion. 

• Search should treat DELETED as though the slot holds a key that does not match 
the one being searched for. 

• Insertion should treat DELETED as though the slot were empty, so that it can be 
reused. 

The disadvantage of using DELETED is that now search time is no longer dependent 

on the load factor a. 


How to compute probe sequences 

The ideal situation is uniform hashing-, each key is equally likely to have any of 
the ml permutations of (0, 1, ..., m — 1) as its probe sequence. (This generalizes 
simple uniform hashing for a hash function that produces a whole probe sequence 
rather than just a single number.) 

It’s hard to implement true uniform hashing, so we approximate it with techniques 
that at least guarantee that the probe sequence is a permutation of (0,1,..., m — 1). 

None of these techniques can produce all m ! probe sequences. They will make use 
of auxiliary hash functions, which map U —> {0, 1, ..., m — 1}. 

Linear probing: Given auxiliary hash function H, the probe sequence starts at 
slot h'(k) and continues sequentially through the table, wrapping after slot m — 1 
to slot 0. 

Given key k and probe number i (0 < i < m), h(k, i ) = (h'(k) + /) mod m. 

The initial probe determines the entire sequence =>• only m possible sequences. 

Lineal - probing suffers from primary clustering-, long runs of occupied sequences 
build up. And long runs tend to get longer, since an empty slot preceded by i full 
slots gets filled next with probability (i + 1 )/m. Result is that the average search 
and insertion times increase. 

Quadratic probing: As in linear probing, the probe sequence starts at H(k ). Un¬ 
like linear probing, it jumps around in the table according to a quadratic function 
of the probe number: h(k, i) = ( h!(k ) -f c\i + cii 1 ) mod m , where C\,C 2 f 0 are 
constants. 

Must constrain c\, C 2 , and m in order to ensure that we get a full permutation 
of (0, 1, .m — 1). (Problem 11-3 explores one way to implement quadratic 
probing.) 

Can get secondary clustering-, if two distinct keys have the same H value, then 
they have the same probe sequence. 
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Double hashing: Use two auxiliary hash functions, h\ and h 2 . hi gives the initial 
probe, and h 2 gives the remaining probes: h(k, i) = (h\(k) + ih 2 (k )) mod m. 
Must have h 2 (k) be relatively prime to m (no factors in common other than 1) in 
order to guarantee that the probe sequence is a full permutation of (0,1,..., m — 1). 

• Could choose m to be a power of 2 and h 2 to always produce an odd number 

> 1. 

• Could let m be prime and have 1 < h 2 (k) < m. 

@(m 2 ) different probe sequences, since each possible combination of h\ (k) 
and h 2 (k) gives a different probe sequence. 


Analysis of open-address hashing 
Assumptions: 

• Analysis is in terms of load factor a. We will assume that the table never 
completely fills, so we always have 0<n<m=$-0<a<l. 

• Assume uniform hashing. 

• No deletion. 

• In a successful search, each key is equally likely to be searched for. 

Theorem 

The expected number of probes in an unsuccessful search is at most 1/(1 — a). 

Proof Since the search is unsuccessful, every probe is to an occupied slot, except 
for the last probe, which is to an empty slot. 

Define random variable X = # of probes made in an unsuccessful search. 

Define events A, , for i = 1,2,..., to be the event that there is an /th probe and that 
it’s to an occupied slot. 

X > i if and only if probes 1,2,...,/ — I are made and are to occupied slots =>■ 
Pr {X > /'} = Pr {Aj n A 2 n • • • n A,-_i}. 

By Exercise C.2-6, 

Pr {Aj fi A 2 n • • • fi A,_!} = PrfAj} • Pr{A 2 | Ai) • Pr{A 3 | A! n A,} • • • 

Pr { Aj_! I A, n a 2 n • • • n a,_ 2 } . 


Claim 

Pr {Aj | A\ fl A 2 fl • • • PI Ay_i} = (n — j + l)/(m — j + 1). Boundary case: j — 1 
=> Pr {Ai} = n/m. 

Proof For the boundary case j — 1, there are n stored keys and m slots, so the 
probability that the first probe is to an occupied slot is n/m. 

Given that j — l probes were made, all to occupied slots, the assumption of uniform 
hashing says that the probe sequence is a permutation of (0, 1, ..., m — 1), which 
in turn implies that the next probe is to a slot that we have not yet probed. There are 
m — j + 1 slots remaining, n — j + 1 of which are occupied. Thus, the probability 
that the j th probe is to an occupied slot is (« — j + l)/(m — j + 1). ■ (claim) 


Using this claim. 



11-14 


Lecture Notes for Chapter 11: Hash Tables 


n 

Pr {X > i} = — 
m 


n — 1 n — 2 n — i + 2 

m — 1 m — 2 m — i + 2 

v ^ 

i — 1 factors 


n < m => (n — j)/(rn — j) < n/m for j > 0, which implies 
/ n V -1 

Pr{X > /} < (-) 

\m/ 


By equation (C.24), 
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m (theorem) 


Interpretation: If a is constant, an unsuccessful search takes 0(1) time. 

• If a = 0.5, then an unsuccessful search takes an average of 1/(1 — 0.5) = 2 
probes. 

• If a = 0.9, takes an average of 1/(1 — 0.9) = 10 probes. 


Corollary 

The expected number of probes to insert is at most 1/(1 — a). 

Proof Since there is no deletion, insertion uses the same probe sequence as an 
unsuccessful search. ■ 


Theorem 

1 1 

The expected number of probes in a successful search is at most — In-. 

a 1 — a 


Proof A successful search for key k follows the same probe sequence as when 
key k was inserted. 

By the previous corollary, if k was the (/ + l)st key inserted, then a equaled i/m 
at the time. Thus, the expected number of probes made in a search for k is at most 
1/(1 — i/m) = m/{m — i). 


That was assuming that k was the (/ + l)st key inserted. We need to average over 
all n keys: 


n — 1 
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where //, = ^ /= i I /j [S the ith harmonic number. 

Simplify by using the technique of bounding a summation by an integral: 
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< — I (1 /x)dx (inequality (A. 12)) 
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m (theorem) 
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Solution to Exercise 11.1-4 

We denote the huge array by T and, taking the hint from the book, we also have a 
stack implemented by an array S. The size of S equals the number of keys actually 
stored, so that S should be allocated at the dictionary’s maximum size. The stack 
has an attribute top[S ], so that only entries S[1 .. top [.S'11 are valid. 

The idea of this scheme is that entries of T and S validate each other. If key k is 
actually stored in T , then T\k\ contains the index, say /, of a valid entry in S, and 
S[j] contains the value k. Let us call this situation, in which 1 < T[k] < top[S ], 
S|T[fc]] = k, and 7'[.S'[ /|| = j, a validating cycle. 

Assuming that we also need to store pointers to objects in our direct-address table, 
we can store them in an array that is parallel to either T or S. Since S is smaller 
than T, we’ll use an array S', allocated to be the same size as S, for these pointers. 
Thus, if the dictionary contains an object x with key k, then there is a validating 
cycle and S'[T[fc]] points to x. 

The operations on the dictionary work as follows: 

• Initialization: Simply set top[S ] = 0, so that there are no valid entries in the 
stack. 

• Search: Given key k, we check whether we have a validating cycle, i.e., 
whether 1 < T[k ] < top[S ] and S|T[&]] = k. If so, we return S'lTfA:]], 
and otherwise we return NIL. 

• Insert: To insert object x with key k, assuming that this object is not already 

in the dictionary, we increment top[S], set S[top[S]] k, set x, 

and set T[k] top[S ]. 

• Delete: To delete object v with key k, assuming that this object is in the 
dictionary, we need to break the validating cycle. The trick is to also ensure 
that we don’t leave a “hole” in the stack, and we solve this problem by moving 
the top entry of the stack into the position that we are vacating—and then fixing 
up that entry’s validating cycle. That is, we execute the following sequence of 
assignments: 
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5[T[fc]] 4- S[top[S]] 

5'[T[A]] <- S'[top[S]] 

nsirwi] <- Tm 

T[k] «- 0 

top[S ] <- top[S ] — 1 

Each of these operations—initialization, Search, Insert, and Delete— takes 
0(1) time. 


Solution to Exercise 11.2-1 


For each pair of keys k,l, where k fk /, define the indicator random variable 
X kt = I {h(k) = h(l)}. Since we assume simple uniform hashing, Vr{X k] — 1} = 
Pr {h (k) = h (/)} — 1/m, and so E \X k /\ = 1/m. 


Now define the random variable Y to be the total number of collisions, so that 
Y = X k \. The expected number of collisions is 


E [7] = E 


kl 


Ex 

kf-l 

kfl 

D- 

2/ m 
n(n — 1) 
2 

n(n — 1) 
2m 


1 

m 


(linearity of expectation) 


Solution to Exercise 11.2-4 

The flag in each slot will indicate whether the slot is free. 

• A free slot is in the free list, a doubly linked list of all free slots in the table. 
The slot thus contains two pointers. 

• A used slot contains an element and a pointer (possibly NIL) to the next element 
that hashes to this slot. (Of course, that pointer points to another slot in the 
table.) 

Operations 

• Insertion: 

• If the element hashes to a free slot, just remove the slot from the free list and 
store the element there (with a NIL pointer). The free list must be doubly 
linked in order for this deletion to run in 0(1) time. 
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• If the element hashes to a used slot j , check whether the element x already 
there “belongs” there (its key also hashes to slot j). 

• If so, add the new element to the chain of elements in this slot. To do 
so, allocate a free slot (e.g., take the head of the free list) for the new 
element and put this new slot at the head of the list pointed to by the 
hashed-to slot (/). 

• If not, E is part of another slot’s chain. Move it to a new slot by allo¬ 
cating one from the free list, copying the old slot’s (y’s) contents (ele¬ 
ment v and pointer) to the new slot, and updating the pointer in the slot 
that pointed to j to point to the new slot. Then insert the new element in 
the now-empty slot as usual. 

To update the pointer to j, it is necessary to find it by searching the chain 
of elements starting in the slot x hashes to. 

• Deletion: Let j be the slot the element x to be deleted hashes to. 

• If v is the only element in j ( j doesn’t point to any other entries), just free 
the slot, returning it to the head of the free list. 

• If v is in j but there’s a pointer to a chain of other elements, move the first 
pointed-to entry to slot j and free the slot it was in. 

• If v is found by following a pointer from j, just free x’s slot and splice it out 
of the chain (i.e., update the slot that pointed to x to point to x’s successor). 

• Searching: Check the slot the key hashes to, and if that is not the desired 
element, follow the chain of pointers from the slot. 

All the operations take expected 0(1) times for the same reason they do with 
the version in the book: The expected time to search the chains is 0(1 + or) 
regardless of where the chains are stored, and the fact that all the elements are 
stored in the table means that a < 1. If the free list were singly linked, then 
operations that involved removing an arbitrary slot from the free list would not 
run in 0(1) time. 


Solution to Exercise 11.3-3 

First, we observe that we can generate any permutation by a sequence of inter¬ 
changes of pairs of characters. One can prove this property formally, but infor¬ 
mally, consider that both heapsort and quicksort work by interchanging pairs of 
elements and that they have to be able to produce any permutation of their input 
array. Thus, it suffices to show that if string x can be derived from string y by 
interchanging a single pair of characters, then x and y hash to the same value. 

Let us denote the ;th character in x by Xj, and similarly for y. The interpreta¬ 
tion of x in radix 2 P is A 2 ip , and so h (x) = x i 2 ' p ) mod (2 P — 1). 

Similarly, h(y) = (E/=o A 2 ^) mod ( 2P ~ !)• 

Suppose that x and y are identical strings of n characters except that the characters 
in positions a and b are interchanged: x a = and y a = x/,. Without loss of 
generality, let a > b. We have 
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h(x) - h(y ) = ^*,-2^ mod (2 P - 1) - ^ y,2^ mod ( 2 P - 1) . 

Since 0 < h(x ), h(y) < 2 P — 1, we have that — (2 P — 1) < h(x) — h(y) < 2 P — 1. 

If we show that ( h(x) — h(y)) mod ( 2 P — 1) = 0, then h(x) = h(y). 

Since the sums in the hash functions are the same except for indices a and b, we 

have 

(h(x) — h{y)) mod (2 P — 1) 

= ((x a 2 ap + x b 2 bp ) - (y a 2 ap + y h 2 bp )) mod (2 P - 1) 

= ((x a 2 ap + x b 2 bp ) - (x b 2 ap + x a 2 bp ) ) mod (2 P - 1) 

= (( x a - x b )2 ap - (x a - x b )2 hp ) mod ( 2 P - 1) 

= ((x a - x b )(2 ap - 2 hp )) mod (2 P - 1) 

= ((x a - x b )2 bp (2 {a ~ h)p - 1)) mod (2 p - 1) . 

By equation (A.5), 

a —b— 1 9(1 a—b)p i 

V 2 pi = -, 

h 2 p — 1 

and multiplying both sides by 2 P ~], we get 2 (a ~ h)p - 1 = (E^o*” 1 2pi ) ( 2P “ ')■ 

Thus, 

(h(x) — h{y)) mod (2 P — 1) 

= ^(x a - x b )2 bp ^ J2 2 p 'j (2 p - l)j mod (2 P - 1) 

= 0 , 

since one of the factors is 2 P — 1. 

We have shown that (h(x) — h (y)) mod (2 P — 1) = 0, and so h (x ) = h (y). 


Solution to Exercise 11.3-5 

Let b = | B | and u = \U\. We start by showing that the total number of collisions 
is minimized by a hash function that maps u/b elements of U to each of the b 
values in B. For a given hash function, let uj be the number of elements that map 
to j € B. We have u = E/<=/; u j- We also have that the number of collisions for a 
given value of j € B is = Uj(iij — l)/2. 

Lemma 

The total number of collisions is minimized when uj = u/b for each j e B. 

Proof If uj < u/b, let us call j underloaded, and if iij > u/b, let us call j 
overloaded. Consider an unbalanced situation in which uj f u/b for at least 
one value j e B. We can think of converting a balanced situation in which all 
uj equal u/b into the unbalanced situation by repeatedly moving an element that 
maps to an underloaded value to map instead to an overloaded value. (If you think 
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of the values of B as representing buckets, we are repeatedly moving elements 
from buckets containing at most u/b elements to buckets containing at least u/b 
elements.) 

We now show that each such move increases the number of collisions, so that 
all the moves together must increase the number of collisions. Suppose that 
we move an element from an underloaded value j to an overloaded value k, 
and we leave all other elements alone. Because j is underloaded and k is 
overloaded, uj < u/b < u k . Considering just the collisions for values j 
and k, we have Uj(uj — l)/2 + u k (u k — l)/2 collisions before the move and 
(u / — 1 )(iij — 2)/2 + (; u k + I )u k /2 collisions afterward. We wish to show that 
uj(uj — 1 )/2 + u k (u k — l)/2 < (;uj — 1 )(uj — 2)/2+ (u k + l)u k /2. We have the 
following sequence of equivalent inequalities: 


Uj 

< 

u k + 1 

2 uj 

< 

2 u k T- 2 

- Uk 

< 

u k — 2 uj + 2 

uj — Uj T uj — u k 

< 

u~ — 3 u j I 2 \ uj + u k 

U j (u j - 1) + u k (u k - 1) 

< 

(Uj - 1 ){uj - 2) + (u k + 1 )u k 

Uj(uj - l)/2 + u k (u k - l)/2 

< 

( uj - 1 )(uj - 2)/2 + (u k + \)u k /2 


Thus, each move increases the number of collisions. We conclude that the number 
of collisions is minimized when uj = u/b for each j e B. m 

By the above lemma, for any hash function, the total number of collisions must 
be at least b(u/b)(u/b — l)/2. The number of pairs of distinct elements is(") = 
u(u — l)/2. Thus, the number of collisions per pair of distinct elements must be at 
least 

b(u/b)(u/b — l)/2 u/b — 1 

u(u — l)/2 u — 1 

u/b — 1 

> - 

u 

1 1 

b u 

Thus, the bound e on the probability of a collision for any pair of distinct elements 
can be no less than \/b — l/u = \/\B\ — 1/ \U\. 


Solution to Problem 11-1 

a. Since we assume uniform hashing, we can use the same observation as is used in 
Corollary 11.7: that inserting a key entails an unsuccessful search followed by 
placing the key into the first empty slot found. As in the proof of Theorem 11.6, 
if we let X be the random variable denoting the number of probes in an unsuc¬ 
cessful search, then Pr{X > /} < a' -1 . Since n < m/2, we have a < 1/2. Let¬ 
ting i — k + 1, we have Pr{X > k] — Pr {X > k + 1} < (\/2) {k+l) ~ i = 2~ k . 
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b. Substituting k = 2 lg n into the statement of part (a) yields that the probability 
that the ith insertion requires more than k = 21g» probes is at most 2“ 21g " = 
(2 lgn ) -2 = n~ 2 = l/n 2 . 

c. Let the event A be X > 2lg n, and for i = 1,2 , ,n, let the event Aj be 

Xj > 2lg n. In part (b), we showed that Pr{A,} < l/n 2 for i = 1,2,..., n. 
From how we defined these events, A = A] U A 2 U • ■ ■ U A„. Using Boole’s 
inequality, (C.18), we have 

Pr{A} < Pr{A 1 } + Pr{A 2 } + -.- + Pr{A„} 

1 

< n ■ — 
n z 

= l/n . 

d. We use the definition of expectation and break the sum into two parts: 

n 

E[X] = ^k - Er{X = k} 

k= 1 

\2\gri\ n 

= J2k-Pr{X = k}+ k ' P 1 {X = k} 

k= 1 k= r21gn] + l 

rzig/p n 

< T2lg-Pr{Z = k}+ n-Pr{X = k } 

k= 1 T2 lg n~| H-1 

r21g»l n 

= Rlgnl Pr{X = k} + n Pr {X = ^} . 

k= 1 *=T2 lg«l + l 

Since X takes on exactly one value, we have that Pr {X = k} = 

Pr {X < r21g«l} < 1 and ELr2i g »i+i Pr ( x = k ) < Pl '{^ > 21 g«l < l/«, 
by part (c). Therefore, 

E[X] < r21g«l -l+n-(l/ra) 

= r2i g »i + i 

= 0(\gn) . 


Solution to Problem 11-2 

a. A particular key is hashed to a particular slot with probability 1 /n. Suppose we 
select a specific set of k keys. The probability that these k keys are inserted into 
the slot in question and that all other keys are inserted elsewhere is 



Since there are (") ways to choose our k keys, we get 
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b. For i = 1,2, ...,«, let X, be a random variable denoting the number of keys 
that hash to slot i, and let A, be the event that X, = k, i.e., that exactly k keys 
hash to slot i. From paid (a), we have Pr{A} = Q k . Then, 

P k = Pr {M — k] 

= p, ‘lfe, x ') =(: l 

= Pr {there exists i such that X, = k and that X, < k for i = 1,2 ,,n] 

< Pr {there exists i such that X, = k} 

= Pr{Aj U A 2 U • • • U A,,} 

< Pr {Ai} + Pr {A 2 } + • • • + Pr {A„} (by inequality (C. 18)) 

= n Q k . 

c. We start by showing two facts. First, 1 — I /n < 1, which implies 

(1 — 1 /n) n ~ k < 1. Second, n !/( n—k )! = n■ (n — 1)- (n— 2) • • • (n — k+ 1) < n k . 
Using these facts, along with the simplification k\ > (k/ef of equation (3.17), 
we have 


Qk = 


l 


n—k 


n\ 


k ! (n — k )! 


n ! 


< 


< 


< 


n k k ! (n — k )! 

1 

it! 

e k 

¥ 


((1 - 1 /n) n ~ k < 1) 
(n\/(n — k)\ < n k ) 
(k\ > ( k/e) k ) . 


d. Notice that when n = 2, lg lg n = 0, so to be precise, we need to assume that 
n > 3. 

In paid (c), we showed that Q k < e k /k k for any k; in paidicular, this inequality 
holds for k(). Thus, it suffices to show that /ko k ° < 1 /n 3 or, equivalently, that 

n 3 <k 0 ko /e k °. 

Taking logarithms of both sides gives an equivalent condition: 

3 lg n < ko(lgk 0 — lge) 
c lg n 

= Tl- (lgc+lglgn-lglglgn-lge) . 

lg lg n 

Dividing both sides by lg n gives the condition 
c 

■(lg c + lg lg n - lg lg lg n - lg e) 


< 


lg lg n 


= c 1 + 


l g c - lg e _ lg lg lg n 
lg lg n lg lg n 
Let x be the last expression in parentheses: 
lgc-lge lg lg lg n N 

v = 1 + 


lg lg n lg lg n 
We need to show that there exists a constant c > 1 such that 3 < cx. 


Noting that lim^oo x = I, we see that there exists /? 0 such that x > 1/2 for all 
n > no. Thus, any constant c > 6 works for n > n 0 . 
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We handle smaller values of n— in particular, 3 < n < tip —as follows. Since 
n is constrained to be an integer, there are a finite number of n in the range 
3 < n < tip. We can evaluate the expression v for each such value of n and 
determine a value of c for which 3 < cx for all values of n. The final value of c 
that we use is the larger of 

• 6, which works for all n > tip, and 

• max 3 <„<„ 0 {c : 3 < cx], i.e., the largest value of c that we chose for the range 
3 < n < tip. 

Thus, we have shown that Q ko < l/n 3 , as desired. 

To see that P k < I / n 1 for k > ko, we observe that by part (b), P k < nQ k for 
all k. Choosing k = ko gives P ko < nQ ko < n ■ (1 /n 3 ) = l/n 2 . Fork > ko, we 
will show that we can pick the constant c such that Q k < l/n 3 for all k > ko, 
and thus conclude that P k < l/n 2 for all k > ko. 

To pick c as required, we let c be large enough that ko > 3 > e. Then e/k < 1 
for all k > ko, and so e k /k k decreases as k increases. Thus, 

Qk < e k /k k 

< e k °/k k ° 

< l/n 3 
for k > ko. 

e. The expectation of M is 

n 

E [M] = J2 k ' Fr {M = k} 

k=o 

kp n 

= J^k-Pr{M = k}+ k-Pr{M = k} 

k=0 k=k 0 + 1 

kp n 

< ko ■ Pi' {iW = k} + n ' P f {iff = k} 

k=0 k=kp+l 

kp n 

< kp Y, P f {iff — k} + n Y Pf {if f = k} 

k =0 k=ko+l 

= kp • Pr {iff < ko} + n ■ Pr {M > ko} , 

which is what we needed to show, since ko = c lg n/ lg lg n. 

To show that E [M] = 0(lg n/ lg lg n), note that Pr {M < k 0 } < 1 and 

n 

Pr {M > k 0 } = Y Pr {iff = /c} 

k=k 0 +\ 
n 

= E ft 

k=kp+\ 

< E i /» 2 

k=k 0 +1 

< n ■ (l/n 2 ) 

= l/n. 


(by part (d)) 
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We conclude that 
E [M] < ko-l+n-(l/n) 

= ^o + l 

= 0(lgn/lglgn) . 


Solution to Problem 11-3 


a. From how the probe-sequence computation is specified, it is easy to see that 
the probe sequence is (h(k), h(k) + 1, h(k) + 1+2, h(k ) + 1 + 2 + 3, ..., 
h(k) + 1 + 2 + 3 + ••• + /,...}, where all the arithmetic is modulo m. Starting 
the probe numbers from 0, the ith probe is offset (modulo m) from h(k) by 


Ej 


id + I) 
2 


1 , 2 1 . 
- iM— i 
2 2 


Thus, we can write the probe sequence as 
h'{k, i) = ^h(k) + - / + - mod m , 

which demonstrates that this scheme is a special case of quadratic probing. 


b. Let h'(k,i ) denote the ith probe of our scheme. We saw in part (a) that 
h'(k, i) = (h(k) + i(i + l)/2) mod m. To show that our algorithm examines 
every table position in the worst case, we show that for a given key, each of 
the first m probes hashes to a distinct value. That is, for any key k and for any 
probe numbers i and j such that 0 < i < j < m, we have h'(k, i ) fk h'(k, j). 
We do so by showing that h f (k, i ) = h'(k, j ) yields a contradiction. 

Let us assume that there exists a key k and probe numbers i and j satsifying 
0 < i < j < m for which h\k, i ) = h'(k, j). Then 


h(k) + i (i + l)/2 = h(k) + j ( j + l)/2 (mod m ) , 


which in turn implies that 

i\i + l)/2 = j(j + l)/2 (mod m) , 


or 

j (j + l)/2 — id + l)/2 = 0 (mod m ) . 

Since j (j + l)/2 — /(/ + l)/2 = (j — i){j + / + l)/2, we have 
(j — i)(j + i + l)/2 = 0 (mod m ) . 

The factors j—i and j +/ +1 must have different paiities, i.e., j —i is even if and 
only if /+/+1 is odd. (Work out the various cases in which i and j are even and 
odd.) Since (j—i)(j+i+l)/2 = 0 (mod m), we have (_/'—/)(_/'+/+l)/2 = rm 
for some integer r or, equivalently, (j — i)( j + / + 1) = r -2m. Using the 
assumption that m is a power of 2, let m = 2 P for some nonnegative integer p, 
so that now we have (j — i)(j + i + 1) = r ■ 2 p+l . Because exactly one of 
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the factors j — i and j + i + 1 is even, 2 p+l must divide one of the factors. It 
cannot be j — i, since j — i < m < 2 p+l . But it also cannot be j + i + 1, since 
_/' + / + 1 < (m — 1) + (m — 2) + 1 = 2m — 2 < 2 p+l . Thus we have derived 
the contradiction that 2 P+1 divides neither of the factors j — i and j + i + 1. 
We conclude that h'(k, i ) h'(k, j). 




Lecture Notes for Chapter 12: 
Binary Search Trees 


Chapter 12 overview 

Search trees 

• Data structures that support many dynamic-set operations. 

• Can be used as both a dictionary and as a priority queue. 

• Basic operations take time proportional to the height of the tree. 

• For complete binary tree with n nodes: worst case ©(lgn). 

• For linear chain of n nodes: worst case &(n). 

• Different types of search trees include binary search trees, red-black trees (cov¬ 
ered in Chapter 13), and B-trees (covered in Chapter 18). 

We will coverbinary search trees, tree walks, and operations on binary search trees. 


Binary search trees 

Binary search trees are an important data structure for dynamic sets. 

• Accomplish many dynamic-set operations in 0(h) time, where h = height of 
tree. 

• As in Section 10.4, we represent a binary tree by a linked data structure in which 
each node is an object. 

• root[T ] points to the root of tree T . 

• Each node contains the fields 

• key (and possibly other satellite data). 

• left : points to left child. 

• right : points to right child. 

• p : points to parent. p[root[T] ] = NIL. 

• Stored keys must satisfy the binary-search-tree property. 

• If y is in left subtree of x, then key\y \ < key[x]. 
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• If y is in right subtree of x, then key\y \ > key[x]. 

Draw sample tree. 

[This is Figure 12.1(a) from the text, using A, B. D. F. H. K in place of 2, 3, 5, 
5, 7, 8, with alphabetic comparisons. It’s OK to have duplicate keys, though there 
are none in this example. Show that the binary-search-tree property holds.] 



The binary-search-tree property allows us to print keys in a binary search tree in 
order, recursively, using an algorithm called an inorder tree walk. Elements are 
printed in monotonically increasing order. 

How Inorder-Tree-Walk works: 

• Check to make sure that x is not NIL. 

• Recursively, print the keys of the nodes in x ’s left subtree. 

• Print v’s key. 

• Recursively, print the keys of the nodes in x’s right subtree. 

Inorder-Tree-Walk (a) 
if x yk nil 

then Inorder-Tree-Walk (left[x]) 
print key[x] 

Inorder-Tree-Walk (right\x\) 

Example: Do the inorder tree walk on the example above, getting the output 
ABDFHK. 

Correctness: Follows by induction directly from the binary-search-tree property. 

Time: Intuitively, the walk takes ©(«) time for a tree with n nodes, because we 
visit and print each node once. [Book has formal proof.] 


Querying a binary search tree 

Searching 

Tree-Search (x, k) 

if x = NIL or k = key[x] 

then return x 

if k < key[x] 

then return Tree-Search (left[x], k) 
else return Tree-SEARCH (right\x ], k ) 

Initial call is Tree-Search (root[T], k). 
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Example: Search for values D and C in the example tree from above. 

Time: The algorithm recurses, visiting nodes on a downward path from the root. 
Thus, running time is 0(h), where h is the height of the tree. 

[The text also gives an iterative version of Tree-Search, which is more effi¬ 
cient on most computers. The above recursive procedure is more straightforward, 
however.] 


Minimum and maximum 

The binary-search-tree property guarantees that 

• the minimum key of a binary search tree is located at the leftmost node, and 

• the maximum key of a binary search tree is located at the rightmost node. 

Traverse the appropriate pointers ( left or right) until NIL is reached. 

Tree-Minimum (x) 
while left[x] 7 ^ nil 
do x <— left[x ] 

return x 

Tree-Maximum (x) 
while right\x\ / nil 
do x <— right[x ] 

return x 

Time: Both procedures visit nodes that form a downward path from the root to a 
leaf. Both procedures run in 0(h) time, where h is the height of the tree. 

Successor and predecessor 

Assuming that all keys are distinct, the successor of a node x is the node y such 
that key[y] is the smallest key > key\x\. (We can findx’s successor based entirely 
on the tree structure. No key comparisons are necessary.) If x has the largest key 
in the binary search tree, then we say that x’s successor is NIL. 

There are two cases: 

1. If node x has a non-empty right subtree, then x’s successor is the minimum in 
x’s right subtree. 

2. If node x has an empty right subtree, notice that: 

• As long as we move to the left up the tree (move up through right children), 
we’re visiting smaller keys. 

• x’s successor y is the node that x is the predecessor of (x is the maximum in 
y’s left subtree). 
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Tree-Successor (x) 
if right[x] 7 ^ NIL 

then return Tree-Minimum (right[x]) 

y p M 

while y f=- NIL and x = rightly] 
do x 4- y 

y ply] 

return v 

Tree-Predecessor is symmetric to Tree-Successor. 


Example: 



• Find the successor of the node with key value 15. (Answer: Key value 17) 

• Find the successor of the node with key value 6. (Answer: Key value 7) 

• Find the successor of the node with key value 4. (Answer: Key value 6) 

• Find the predecessor of the node with key value 6. (Answer: Key value 4) 

Time: For both the Tree-Successor and Tree-Predecessor procedures, in 
both cases, we visit nodes on a path down the tree or up the tree. Thus, running 
time is 0(h), where h is the height of the tree. 


Insertion and deletion 

Insertion and deletion allows the dynamic set represented by a binary search tree 
to change. The binary-search-tree property must hold after the change. Insertion is 
more straightforward than deletion. 
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Insertion 

Tree-Insert (T, z) 

y NIL 
v 4- root[T] 
while x ^ nil 
do y 4— x 

if key\z\ < key\x | 
then x left\x ] 
else •<— right[x] 
p[z ] y 

if y = NIL 

then root[T ] <— z [> Tree T was empty 

else if key[z\ < key\y] 
then left\y\ z 
else right [y\ z 

• To insert value v into the binary search tree, the procedure is given node z, with 
key [r | = v, left[z ] = NIL, and right\z,\ = NIL. 

• Beginning at root of the tree, trace a downward path, maintaining two pointers. 

• Pointer a: : traces the downward path. 

• Pointer y: “trailing pointer” to keep track of parent of x. 

• Traverse the tree downward by comparing the value of node at a: with v, and 
move to the left or right child accordingly. 

• When a: is NIL, it is at the correct position for node z. 

• Compare z’s value with y’s value, and insert z at either y’s left or right, appro¬ 
priately. 

Example: Run Tree-Insert (C) on the first sample binary search tree. Result: 



Time: Same as Tree-Search. On a tree of height h, procedure takes 0(h) time. 

Tree-Insert can be used with Inorder-Tree-Walk to sort a given set of num¬ 
bers. (See Exercise 12.3-3.) 
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Deletion 

Tree-Delete is broken into three cases. 

Case 1: z has no children. 

• Delete z by making the parent of z point to NIL, instead of to z. 

Case 2: z has one child. 

• Delete z by making the parent of 2 point to z’s child, instead of to z. 

Case 3: z has two children. 

• z’s successor y has either no children or one child, (y is the minimum 
node—with no left child—in z’s right subtree.) 

• Delete y from the tree (via Case 1 or 2). 

• Replace z’s key and satellite data with y’s. 

Tree-Delete (T, z) 

> Determine which node y to splice out: either z or z’s successor, 
if left\z, I = NIL or right[z\ = NIL 

then y 4- z 

else y Tree-Successor (z) 

[> v is set to a non-NlL child of y, or to NIL if y has no children, 
if left[y] jL nil 
then x <r- left[y ] 
else x rightly ] 

[> y is removed from the tree by manipulating pointers of p [ y | and x. 
if X NIL 

then p[x] <r- ply] 

if ply ] = nil 
then rootlT ] x 

else if y = left\p\y\\ 

then leftlply]] x 
else rightlply]] x 

> If it was z’s successor that was spliced out, copy its data into z. 

if y # z 

then keylz] keyly ] 

copy y’s satellite data into z 

return y 

Example: We can demonstrate on the above sample tree. 

• For Case 1, delete K. 

• For Case 2, delete H. 

• For Case 3, delete B, swapping it with C. 


Time: 0(h), on a tree of height h. 
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Minimizing running time 

We’ve been analyzing running time in terms of h (the height of the binary search 
tree), instead of n (the number of nodes in the tree). 

• Problem: Worst case for binary search tree is (-)(/; ) —no better than linked list. 

• Solution: Guarantee small height (balanced tree)—/? = O(lgn). 

In later chapters, by varying the properties of binary search trees, we will be able 
to analyze running time in terms of n. 

• Method: Restructure the tree if necessary. Nothing special is required for 
querying, but there may be extra work when changing the structure of the tree 
(inserting or deleting). 

Red-black trees are a special class of binary trees that avoids the worst-case behav¬ 
ior of O(n) like “plain” binary search trees. Red-black trees are covered in detail 
in Chapter 13. 


Expected height of a randomly built binary search tree 

[These are notes on a starred section in the book. I covered this material in an 
optional lecture.] 

Given a set of n distinct keys. Insert them in random order into an initially empty 
binary search tree. 

• Each of the n ! permutations is equally likely. 

• Different from assuming that every binary search tree on n keys is equally 
likely. 

Try it for n = 3. Will get 5 different binary search trees. When we look at the 
binary search trees resulting from each of the 3! input permutations, 4 trees will 
appeal - once and 1 tree will appeal' twice. [This gives the idea for the solution 
to Exercise 12.4-3.] 

• Forget about deleting keys. 

We will show that the expected height of a randomly built binary search tree is 
OOg n). 

Random variables 

Define the following random variables: 

• X n = height of a randomly built binary search tree on n keys. 

• Y n = 2 X " = exponential height. 

• R n = rank of the root within the set of n keys used to build the binary search 
tree. 

• Equally likely to be any element of {1,2,...,«}. 

• If R n = i , then 

• Left subtree is a randomly-built binary search tree on / — 1 keys. 

• Right subtree is a randomly-built binary search tree on n — i keys. 
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Foreshadowing 

We will need to relate E [ Y„ ] to E [ X„ \. 

We’ll use Jensen’s inequality. 

E [ f{X )] > / (E [X]) , [leave on board] 

provided 

• the expectations exist and are finite, and 

• f(x) is convex', for all x, y and all 0 < a < I 

fiXx + (1 - X)y) < Xf{x) + (1 - X)f{y) . 



Convex = “curves upwai'd” 

We’ll use Jensen’s inequality for fix) = 2 X . 



Since 2 X curves upwai'd, it’s convex. 

Formula for Y„ 

Think about Y n , if we know that R„ = i : 



Height of root is 1 more than the maximum height of its children: 
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Y n = 2.max(r,_ 1 ,r„_ / ) . 

Base cases: 

• Y] = 1 (expected height of a 1-node tree is 2? = 1). 

• Define Fo = 0. 

Define indicator random variables Z„ j, Z n<2 , ..., Z nn : 

Z n j = I {Rn-i} ■ 

R n is equally likely to be any element of {1,2,...,/?} 

=* Pr{/?„ = /} = 1 /n 

=> E | Z n i | = \/n [leave on board] 

(since E [I {A}] = Pr {A}) 

Consider a given «-node binary search tree (which could be a subtree). Exactly 
one Z n i is 1, and all others are 0. Hence, 

n 

Y n = ^ Z n i ■ (2 • max(F,_i, 7„_ ( )) . [leave on board] 

i= 1 

[Recall: Y n — 2 ■ max(K,_|. F„_,) was assuming that R„ = i.] 


Bounding E [F„] 

We will show that E [ Y„ \ is polynomial in n, which will imply that E [ X n \ = 
OOg n). 

Claim 

Z n i is independent of E_i and 

Justification If we choose the root such that R„ — i, the left subtree contains i — 1 
nodes, and it’s like any other randomly built binary search tree with i — 1 nodes. 
Other than the number of nodes, the left subtree’s structure has nothing to do with 
it being the left subtree of the root. Hence, l/_i and Z n i are independent. 

Similarly, 7„_,- and Z n i are independent. ■ (claim) 

Fact 

If X and Y are nonnegative random variables, then E| max (A, Y ) ] < E[X]+E[Y]. 
[Leave on board. This is Exercise C.3-4 from the text.] 


Thus, 
E [ Y n \ 


^Z, M (2-max(7 ! _ 1 ,7„_ ! )) 

i=l 

^E [Z n ,i ■ (2 ■ max(T,_i, T„_,))| 


i= 1 


n 

^E[Z n , ! ].E[2-max(7 ! _ 1 ,7„_ ! )] 

i=l 


(linearity of expectation) 


(independence) 
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V - • E [2 • max(Y/_i, E„_, )] (E [Z, M ] = 1/n) 


- V'E[max(E-i, F„_,)] 

11 ‘ J 


(E [ aX ] = a E [X]) 


-y'(E[y;-_ 1 ] + E[y l ,_ / ]) 

n • J 


(earlier fact) . 


Observe that the last summation is 

(E [ n.! + E [E„-i]) + (E [7!] + E [F„_ 2 ]) + (E [Y 2 \ + E [7„_ 3 ]) 

n— 1 

+ --- + (E[F„_ 1 ] + E[F 0 ]) = 2^E[F 1 ] , 

i=o 

and so we get the recurrence 
4 

E [ K„ | < — ) E [ K,| . [leave on board] 

n • J 


Solving the recurrence 


We will show that for all integers n > 0, this recurrence has the solution 
1 fn + 3\ 

E[E„]<- „ . 


4 V 3 


Lemma 


i + 3\ (n 4“ 3 


[This lemma solves Exercise 12.4-1.] 

(n\ (n — 1\ (n — 1 

Proof Use Pascal’s identity (Exercise C.l-7): 11 = 1 1 + 1 

/4\ /3\ 

Also using the simple identity 11 = 1 = 11, we have 

(n + 3\ (n + 2\ (n + 2\ 

( 4 ) = ( 3 M 4 ) 

= crMTH"; 1 ) 

= (TM-rMsM:) 


THVX 


■ (lemma) 
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We solve the recurrence by induction on n. 
Basis: n — 1. 


1 = Y 1= E[7i] < 


1/1 + 3 



1 . 


Inductive step: Assume that E[T,] < 


1/7 + 3 
4 V 3 


for all i < n. Then 



1 /n + 3 
n \ 4 
1 (w + 3)! 
n 4! (n — 1)! 
1 (n + 3)! 

4’ 3! n! 

1 /n + 3 
4 \ 3 



(from before) 


(inductive hypothesis) 


(lemma) 


1 fn + 3 

Thus, we’ve proven that E [T„] < -1 


Bounding E [X„] 

With our bound on E[T„], we use Jensen’s inequality to bound E[X„]: 
2 E[X " ] < E [2 X "] = E [ Y n | . 

Thus, 

2 E[X„] < 1 ^ 

1 (n + 3)(n + 2)(/r + 1) 

4 6 

= 0 (« 3 ). 

Taking logs of both sides gives E[X„] = O(lgn). 

Done! 



Solutions for Chapter 12: 
Binary Search Trees 


Solution to Exercise 12.1-2 

In a heap, a node’s key is > both of its children’s keys. In a binary search tree, a 
node’s key is > its left child’s key, but < its right child’s key. 

The heap property, unlike the binary-searth-tree property, doesn’t help print the 
nodes in sorted order because it doesn’t tell which subtree of a node contains the 
element to print before that node. In a heap, the largest element smaller than the 
node could be in either subtree. 

Note that if the heap property could be used to print the keys in sorted order in 
O(n) time, we would have an 0(n)-time algorithm for sorting, because building 
the heap takes only O(n) time. But we know (Chapter 8) that a comparison sort 
must take £2 (n lg n) time. 


Solution to Exercise 12.2-5 

Let x be a node with two children. In an inorder tree walk, the nodes in x ’s left 
subtree immediately precede x and the nodes in x’s right subtree immediately fol¬ 
low x. Thus, x’s predecessor is in its left subtree, and its successor is in its right 
subtree. 

Let s be x’s successor. Then s cannot have a left child, for a left child of s would 
come between x and s in the inorder walk. (It’s after x because it’s in x’s right 
subtree, and it’s before s because it’s in s’s left subtree.) If any node were to come 
between x and 5 in an inorder walk, then s would not be x’s successor, as we had 
supposed. 

Symmetrically, x’s predecessor has no right child. 


Solution to Exercise 12.2-7 


Note that a call to Tree-Minimum followed by n — 1 calls to Tree-Successor 
performs exactly the same inorder walk of the tree as does the procedure INORDER- 
Tree-Walk. Inorder-Tree-Walk prints the Tree-Minimum first, and by 
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definition, the TREE-SUCCESSOR of a node is the next node in the sorted order 
determined by an inorder tree walk. 

This algorithm runs in @(n) time because: 

• It requires Q(n) time to do the n procedure calls. 

• It traverses each of the n — 1 tree edges at most twice, which takes O(n) time. 

To see that each edge is traversed at most twice (once going down the tree and once 
going up), consider the edge between any node u and either of its children, node v. 
By starting at the root, we must traverse (u, v ) downward from u to v, before 
traversing it upward from u to u. The only time the tree is traversed downward is 
in code of Tree-Minimum, and the only time the tree is traversed upward is in 
code of Tree-Successor when we look for the successor of a node that has no 
right subtree. 

Suppose that v is if s left child. 

• Before printing u, we must print all the nodes in its left subtree, which is rooted 
at v, guaranteeing the downward traversal of edge (u, v). 

• After all nodes in it 's left subtree are printed, u must be printed next. Procedure 
Tree-Successor traverses an upward path to u from the maximum element 
(which has no right subtree) in the subtree rooted at v. This path clearly includes 
edge (u, v) , and since all nodes in it 's left subtree are printed, edge (u , v) is 
never traversed again. 

Now suppose that v is u ’s right child. 

• After u is printed, TREE-SUCCESSOR (u) is called. To get to the minimum 
element in if s right subtree (whose root is v), the edge (u, v) must be traversed 
downward. 

• After all values in if s right subtree are printed, TREE-SUCCESSOR is called on 
the maximum element (again, which has no right subtree) in the subtree rooted 
at v. Tree-Successor traverses a path up the tree to an element after u, 
since u was already printed. Edge (u, v ) must be traversed upward on this path, 
and since all nodes in if s right subtree have been printed, edge (u, v ) is never 
traversed again. 

Hence, no edge is traversed twice in the same direction. 

Therefore, this algorithm runs in Q(n) time. 


Solution to Exercise 12.3-3 


Here’s the algorithm: 

Tree-Sort (A) 

let T be an empty binary search tree 
for / <— 1 to n 

do Tree-Insert (T, A[/]) 
Inorder-Tree-Walk ( root[T ]) 
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Worst case: 0(« 2 )—occurs when a linear chain of nodes results from the repeated 
Tree-Insert operations. 

Best case: (-)(/; lg n) — occurs when a binary tree of height 0(lg n) results from the 
repeated Tree-Insert operations. 


Solution to Exercise 12.4-1 


We will answer the second part first. We shall show that if the average depth of a 
node is 0(lg n), then the height of the tree is 0 Qn lg n). Then we will answer the 
first part by exhibiting that this bound is tight: there is a binary search tree with 
average node depth 0(1 g n) and height 0 (fn lg n) = w(lg«). 


Lemma 

If the average depth of a node in an n-node binary search tree is 0(1 g n), then the 
height of the tree is Ofjn lg n). 


Proof Suppose that an n-node binary search tree has average depth 0(lg n) and 
height h. Then there exists a path from the root to a node at depth h , and the depths 
of the nodes on this path are 0, 1 ,,h. Let P be the set of nodes on this path and 
Q be all other nodes. Then the average depth of a node is 


1 

n 


depth (jt) + 'y depth (y) 


> 


\XSP 


yeQ 


1 ^ 

- > depth (x) 


xsP 

1 h 

-Y\d 

n d=0 


1 9 

= — • ®(h 2 ) . 

n _ 

For the puipose of contradiction, suppose that h is not 0(fn lg n), so that h = 

c ofjn lg n). Then we have 

1 , 1 
- • Q(h 2 ) = - ■ o>(n lg n) 

n n 

= £«(lg n) , 

which contradicts the assumption that the average depth is 0(lg«). Thus, the 
height is 0( fn lg n). m 


Here is an example of an n-node binary search tree with average node depth 0(lg n) 
but height cu(lg n): 
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In this tree, n — fn Ig n nodes are a complete binary tree, and the other yfn lg n 
nodes protrude from below as a single chain. This tree has height 
0(lgO? - y/nlgn)) + y/n lg n = (-)( v ''/i Ig n) 

= co(\gn) . 

To compute an upper bound on the average depth of a node, we use 0(lgn) as 
an upper bound on the depth of each of the n — fn Ig n nodes in the complete 
hinai'y tree part and O (lg n + y/n lg n) as an upper bound on the depth of each of 
the yfn lg n nodes in the protruding chain. Thus, the average depth of a node is 
bounded from above by 

1 /- /- ,- 1 

- • 0(y/n\gn (lg n + y/n\gn) + (n - f n \gn)\gn) = - • 0(n\gn) 

n n 

= 0(]gn) . 

To bound the average depth of a node from below, observe that the bottommost 
level of the complete binary tree part has 0(« — fn Ig n) nodes, and each of these 
nodes has depth 0(lg«). Thus, the average node depth is at least 

1 ,- 1 

— • 0 ((n — yjn lg n) lg n) = — ■ £2 (n lg n) 

n n 

= Q (lg n) . 

Because the average node depth is both O(lgn) and £2(lg»), it is 0(lg n). 


Solution to Exercise 12.4-4 

We’ll go one better than showing that the function 2 V is convex. Instead, we’ll 
show that the function c x is convex, for any positive constant c. According to the 
definition of convexity on page 1109 of the text, a function f(x ) is convex if for all 
x and y and for all 0 < X < 1, we have /(ax + (1 — A)y) < A .fix) + (1 — A)/(y). 
Thus, we need to show that for all 0 < A < 1, we have < \ c x -\-(\—X)c y . 

We start by proving the following lemma. 

Lemma 

For any real numbers a and b and any positive real number c, 
c a > c h + (a — b)c b In c . 

Proof We first show that for all real r, we have d > 1 +r Inc. By equation (3.11) 
from the text, we have ed > I +x for all real x. Let x' = r In c, so that e* = e rlnc = 
( e lnc ) r = c r . Then we have c r = e rlnc > 1 + r Inc. 

Substituting a — b for r in the above inequality, we have &~ b > 1 + (a — b) In c. 
Multiplying both sides by c 6 gives c a > c b + {a — b)c b In c. ■ (lemma) 

Now we can show that c Xx+<l - l)y < kc x + (1 — A )c y for all 0 < A < 1. For 
convenience, let z = Ax + (1 — A)y. 

In the inequality given by the lemma, substitute x for a and z for b, giving 
c x > c z + (x — z)c z Inc . 
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Also substitute y for a and z for b, giving 

c y > c z + (y — z)c z Inc . 

If we multiply the first inequality by X and the second by 1 — A and then add the 
resulting inequalities, we get 

Ac* + (1 - A )c y 

> A(c : + (x — z)c z Inc) + (1 — A)(c z + (y — z)c z Inc) 

= A c z + A xc z Inc — A zc z Inc + (1 — A)c : + (1 — A)yc'” Inc — (1 — A)zc z Inc 
= (A + (1 — A))c z + (Ax + (1 — A)y)c z In c — (A + (1 — A ))zc z In c 
= c z + zc z In c — zc z In c 
= c z 

_ c T*+(l-Wy 

as we wished to show. 


Solution to Problem 12-2 

To sort the strings of S, we first insert them into a radix tree, and then use a preorder 

tree walk to extract them in lexicographically sorted order. The tree walk outputs 

strings only for nodes that indicate the existence of a string (i.e., those that are 

lightly shaded in Figure 12.5 of the text). 

Correctness: The preorder ordering is the correct order because: 

• Any node’s string is a prefix of ah its descendants’ strings and hence belongs 
before them in the sorted order (rule 2). 

• A node’s left descendants belong before its right descendants because the corre¬ 
sponding strings are identical up to that parent node, and in the next position the 
left subtree’s strings have 0 whereas the right subtree’s strings have 1 (rule 1). 

Time: ®(n). 

• Insertion takes 0(n) time, since the insertion of each string takes time propor¬ 
tional to its length (traversing a path through the tree whose length is the length 
of the string), and the sum of all the string lengths is n. 

• The preorder tree walk takes O(n) time. It is just like Inorder-Tree-Walk 
( it prints the current node and calls itself recursively on the left and right sub¬ 
trees), so it takes time proportional to the number of nodes in the tree. The 
number of nodes is at most 1 plus the sum (n) of the lengths of the binary 
strings in the tree, because a length-/ string corresponds to a path through the 
root and / other nodes, but a single node may be shared among many string 
paths. 
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Solution to Problem 12-3 


a. The total path length P(T ) is defined as ^(*> ^)- Dividing both quanti¬ 

ties by n gives the desired equation. 

b. For any node x in T L , we have d(x,T L ) = d(x,T) — 1, since the distance to 
the root of 7} is one less than the distance to the root of T. Similarly, for any 
node x in T R , we have d(x , T R ) = d(x, T ) — 1. Thus, if T has n nodes, we 
have 


P(T) = P(T L ) + P(T R ) + n- 1, 

since each of the n nodes of T (except the root) is in either 7/ or T R . 


c. If T is a randomly built binary search tree, then the root is equally likely to be 
any of the n elements in the tree, since the root is the first element inserted. 
It follows that the number of nodes in subtree T L is equally likely to be any 
integer in the set {0, 1,..., n — 1}. The definition of Pin) as the average total 
path length of a randomly built binary search tree, along with part (b), gives us 
the recurrence 


Y n—i 

Pin) = - y' ( P(i) + P(n - i - 1 ) + n - 1 ) . 
n 

i=0 

d. Since P( 0) = 0, and since for k = 1, 2,..., n — 1, each term P(k) in the 
summation appears once as Pii) and once as Pin — i — 1), we can rewrite the 
equation from paid (c) as 


ft — 1 


P(n) = - V P(k) + &(n) . 

n ‘ J 


k=\ 


e. Observe that if, in the recurrence (7.6) in paid (c) of Problem 7-2, we replace 
E [T (•)] by P(-) and we replace q by k, we get almost the same recurrence as in 
paid (d) of Problem 12-3. The remaining difference is that in Problem 12-3(d), 
the summation starts at 1 rather than 2. Observe, however, that a binary tree 
with just one node has a total path length of 0, so that P(l) =0. Thus, we can 
rewrite the recurrence in Problem 12-3(d) as 


n— 1 


Pin) = -y P ik) + Qin) 

n * J 


k=2 


and use the same technique as was used in Problem 7-2 to solve it. 
We start by solving paid (d) of Problem 7-2: showing that 


H-l | | 

yklgk < —n 2 lg n - -n 2 . 


k=2 


Following the hint in Problem 7-2(d), we split the summation into two parts: 

n— 1 |>i/21-l n —1 

yk\gk= y kigk+ y n g k. 

k=2 k=2 k=ln/2] 
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The lg k in the first summation on the right is less than lg(n/2) = lg n — 1, and 
the lg k in the second summation is less than lg n. Thus, 

n— 1 \n/2~\ — 1 n —1 

yklgk < (lg n — \) ^ k + lgn ^ k 

~ k =2 *= r «/21 

n — 1 rn/21-1 

= ig n J2 k ~ k 


k=2 


k=2 


k=2 


1 


< -n(n-l)lgn 

1 2l 1 2 

< -nlgn - n 

~ 2 5 8 

if n > 2. 

Now we show that the recurrence 
2 n ~ ' 

P(n) = -Y P(k) + ®(n) 
n L —' 


(§-) 


1 /« 

2 


k=2 


has the solution /'(«) = 0(»lg»). We use the substitution method. Assume 
inductively that P(n) < an lg n + b for some positive constants a and b to be 
determined. We can pick a and b sufficiently large so that an lg n + b > P(\). 
Then, for n > 1, we have by substitution 

n— 1 


Pin) = -yp(k) + (~)(n) 

k=2 
n— 1 

y^(ak lg k + b) + 0(«) 


< 


k=2 

2a y 2b 

— Yk\gk + —{n-2) + &{n) 

n ti n 

2a (\ , 1 2b 

— lg n - -n 2 ) H-(« - 2) + 0(«) 

n \2 8 / n 

a 


< an lg n — —n + 2b + 0(«) 

= an lg n + b + ^0(n) + b — —nj 
an Ign + b , 


< 


since we can choose a large enough so that |« dominates 0(n) + b. Thus, 
P(«) = 0{n lgn). 


/. We draw an analogy between inserting an element into a subtree of a binary 
search tree and sorting a subarray in quicksort. Observe that once an element x 
is chosen as the root of a subtree T, all elements that will be inserted after x 
into T will be compared to x. Similarly, observe that once an element y is 
chosen as the pivot in a subarray S, all other elements in S will be compared 
to y. Therefore, the quicksort implementation in which the comparisons are 
the same as those made when inserting into a binary search tree is simply to 
consider the pivots in the same order as the order in which the elements are 
inserted into the tree. 
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Chapter 13 overview 

Red-black trees 

• A variation of binary search trees. 

• Balanced : height is 0(lg n ), where n is the number of nodes. 

• Operations will take 0(lg n) time in the worst case. 

[These notes are a bit simpler than the treatment in the book, to make them more 
amenable to a lecture situation. Our students first see red-black trees in a course 
that precedes our algorithms course. This set of lecture notes is intended as a 
refresher for the students, bearing in mind that some time may have passed since 
they last saw red-black trees. 

The procedures in this chapter are rather long sequences of pseudocode. You might 
want to make arrangements to project them rather than spending time writing them 
on a board.] 


Red-black trees 

A red-black tree is a binary search tree + 1 bit per node: an attribute color, which 
is either red or black. 

All leaves are empty (nil) and colored black. 

• We use a single sentinel, nil[T\, for all the leaves of red-black tree T. 

• color[nil\T]\ is black. 

• The root’s parent is also nil\T\. 

All other attributes of binary search trees are inherited by red-black trees {key, left, 
right, and p). We don’t care about the key in nil\ T ]. 

Red-black properties 


[Leave these up on the board.] 
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1. Every node is either red or black. 

2. The root is black. 

3. Every leaf (nil[T]) is black. 

4. If a node is red, then both its children are black. (Hence no two reds in a row 
on a simple path from the root to a leaf.) 

5. For each node, all paths from the node to descendant leaves contain the same 
number of black nodes. 

Example: 



[Nodes with bold outline indicate black nodes. Don’t add heights and black-heights 
yet. We won’t bother with drawing nil\ T \ any more.] 


Height of a red-black tree 

• Height of a node is the number of edges in a longest path to a leaf. 

• Black-height of a node x : bh(.r) is the number of black nodes (including nil[T ]) 
on the path from x to leaf, not counting x. By property 5, black-height is well 
defined. 

[Now label the example tree with height h and bh values.] 

Claim 

Any node with height h has black-height > h/2. 

Proof By property 4, < h/2 nodes on the path from the node to a leaf are red. 
Hence > h/2 are black. ■ (claim) 

Claim 

The subtree rooted at any node x contains > 2 hhu ’- 1 — 1 internal nodes. 
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Proof By induction on height of x. 

Basis: Height of x = 0 =>■ x is a leaf => bh(.x) = 0. The subtree rooted at x has 0 
internal nodes. 2° — 1 = 0. 

Inductive step: Let the height of x be h and bh(x) = b. Any child of x has 
height h — 1 and black-height either b (if the child is red) or b — 1 (if the child is 
black). By the inductive hypothesis, each child has > 2 iIi(a, ~ i — 1 internal nodes. 
Thus, the subtree rooted at x contains > 2 • (2 bh P4-i — 1) + 1 = 2 bh ( A) — 1 internal 
nodes. (The +1 is forx itself.) ■ (claim) 

Lemma 

A red-black tree with n internal nodes has height < 2 lg(« + 1). 

Proof Let h and b be the height and black-height of the root, respectively. By the 
above two claims, 

n > 2 h - 1 > 2 h/2 - 1 . 

Adding 1 to both sides and then taking logs gives lg(« + 1) > h /2, which implies 
that h < 2 lg(/? + I ). ■ (theorem) 

Operations on red-black trees 

The non-modifying binary-search-tree operations Minimum, Maximum, Suc¬ 
cessor, Predecessor, and Search run in O (height) time. Thus, they take 
0(lg n ) time on red-black trees. 

Insertion and deletion are not so easy. 

If we insert, what color to make the new node? 

• Red? Might violate property 4. 

• Black? Might violate property 5. 

If we delete, thus removing a node, what color was the node that was removed? 

• Red? OK, since we won’t have changed any black-heights, nor will we have 
created two red nodes in a row. Also, cannot cause a violation of property 2, 
since if the removed node was red, it could not have been the root. 

• Black? Could cause there to be two reds in a row (violating property 4), and 
can also cause a violation of property 5. Could also cause a violation of prop¬ 
erty 2, if the removed node was the root and its child—which becomes the new 
root—was red. 


Rotations 


• The basic tree-restructuring operation. 

• Needed to maintain red-black trees as balanced binary search trees. 

• Changes the local pointer structure. (Only pointers are changed.) 
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• Won’t upset the binary-search-tree property. 

• Have both left rotation and right rotation. They are inverses of each other. 

• A rotation takes a red-black-tree and a node within the tree. 




Left-Rotate (T. x) 
y <— right[x ] > Set y. 

right[x ] left[y] > Turn y’s left subtree into x’s right subtree, 

if Ieft[y\ jh nil[T ] 
then p[left[y]] x 

p[y] <r- p[x] > Link x’s parent to y. 

if p[x] = m7[ T\ 
then root[T] y 
else if r = left]p\x\\ 

then left\p\x\\ y 

else right\p[x\\ 4- y 
left[y\ x t> Put ,x on y’s left. 

p[x] <- y 

[In the first two printings of the second edition , this procedure contains a bug that 
is corrected above (and in the third and subsequent printings). The bug is that the 
assignment in line 4 (p [ left [ y 11 <— x) should be performed only when y ’s left child 
is not the sentinel (which is tested in line 3). The first two printings omitted this 
test.] 

The pseudocode for Left-Rotate assumes that 

• right]x ] / nil[T], and 

• root’s parent is nil[T]. 

Pseudocode for Right-Rotate is symmetric: exchange left and right everywhere. 

Example: [Use to demonstrate that rotation maintains inorder ordering of keys. 
Node colors omitted.] 
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• Before rotation: keys of x’s left subtree < 11 < keys of y’s left subtree < 18 < 
keys of y ’s right subtree. 

• Rotation makes y’s left subtree into .r’s right subtree. 

• After rotation: keys of.r’s left subtree < 11 < keys of.r’s right subtree < 18 < 
keys of y’s right subtree. 

Time: 0(1) for both Left-Rotate and Right-Rotate, since a constant number 
of pointers are modified. 

Notes: 

• Rotation is a very basic operation, also used in AVL trees and splay trees. 

• Some books talk of rotating on an edge rather than on a node. 


Insertion 


Start by doing regular binary-search-tree insertion: 
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RB-Insert(7\ z) 
y ■*— nil\T \ 
x <- root[T ] 
while x / nil[T ] 
do y x 

if key\z\ < key\x | 
then x 4- /<?//[ x | 
else a right[x\ 

Piz ] >’ 

if _y = m7[ T \ 
then root[T ] z 
else if key[z ] < key[y] 
then left [y ] z 
else rightly] ■<— z 
left[z\ <- nil[T ] 
ngfe[z] w7[ T \ 
color[z ] •<— RED 
RB-Insert-Fixup (7\ z) 

• RB-Insert ends by coloring the new node z red. 

• Then it calls RB-Insert-Fixup because we could have violated a red-black 
property. 

Which property might be violated? 

1. OK. 

2. If z is the root, then there’s a violation. Otherwise, OK. 

3. OK. 

4. If /;[z I is red, there’s a violation: both z and p\z\ are red. 

5. OK. 

Remove the violation by calling RB-Insert-Fixup: 
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RB-Insert-Fixup (T, z.) 
while color[p[z]\ = RED 
do if p[z\ = left[p[p[z]]\ 

then v right\p\p\z,\\\ 
if color[y ] = RED 
then color[p[z ]] BLACK 
color\y] <- BLACK 

color[p[p[z\W RED 

^ p[p[z\] 

else if z — right[p[z]\ 
then z p[z\ 

Left- Rotate (T, z) 
color\p[z ]] <- BLACK 
color[p[p[z\W RED 

Right-Rotate (T, p[p[z]]) 
else (same as then clause 

with “right” and “left” exchanged) 
color[root\T]\ <— BLACK 

Loop invariant: 

At the start of each iteration of the while loop, 

a. z is red. 

b. There is at most one red-black violation: 

• Property 2: z is a red root, or 

• Property 4: z and p\z\ are both red. 

[The book has a third part of the loop invariant, but we omit it for lecture.] 

Initialization: We’ve already seen why the loop invariant holds initially. 

Termination: The loop terminates because p\ z \ is black. Hence, property 4 is 
OK. Only property 2 might be violated, and the last line fixes it. 

Maintenance: We drop out when z is the root (since then p\z.\ is the sentinel 
nil[T], which is black). When we start the loop body, the only violation is of 
property 4. 

There are 6 cases, 3 of which are symmetric to the other 3. The cases are not 
mutually exclusive. We’ll consider cases in which p[z\ is a left child. 

Let y be z’s uncle (p[z]’s sibling). 


D> Case 1 

> Case 1 
t> Case 1 

> Case 1 

> Case 2 
[> Case 2 
D> Case 3 
D> Case 3 

> Case 3 
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Case 1: y is red 





■ lApVzW (z’s grandparent) must be black, since z and p\z\ are both red 
and there are no other violations of property 4. 

• Make p[z\ and y black => now z and p\z\ are not both red. But prop¬ 
erty 5 might now be violated. 

• Make p[p[z\\ red => restores property 5. 

• The next iteration has p[p[z\\ as the new z (i.e., z moves up 2 levels). 
Case 2: y is black, z is a right child 



Case 2 Case 3 

• Left rotate around p\z,\ =>■ now z is a left child, and both z and p\z\ are 
red. 

• Takes us immediately to case 3. 

Case 3: y is black, z is a left child 

• Make p[z\ black and p\p\z\ \ red. 

• Then right rotate on p[p[z]\- 

• No longer have 2 reds in a row. 

• p\z\ is now black => no more iterations. 

Analysis 

0(lg n) time to get through RB-Insert up to the call of RB-Insert-Fixup. 
Within RB-Insert-Fixup: 
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• Each iteration takes 0(1) time. 

• Each iteration is either the last one or it moves z up 2 levels. 

• 0(lg«) levels =» OOg n) time. 

• Also note that there are at most 2 rotations overall. 

Thus, insertion into a red-black tree takes Odg n) time. 


Deletion 


Start by doing regular binary-search-tree deletion: 

RB-Delete (T, z) 

if Ieft[z ] = nil\ T | or right[z\ = nil\T ] 
then y <r- z 

else y Tree-Successor (z) 
if left[y\ # nil\T\ 

then x <r- left[y] 
else v <— rightly ] 
p[x] p[y ] 

if ply] = nillT] 
then rootlT ] x 

else if y = leftlply\\ 

then leftlply]] <- x 
else rig lit \ /?[>’]] x 

ify # z 

then keylz] key[y] 

copy y ’s satellite data into z 
if colorly] = BLACK 
then RB-Delete-Fixup (T, x) 

return y 

• y is the node that was actually spliced out. 

• v is either 

• y’s sole non-sentinel child before y was spliced out, or 

• the sentinel, if y had no children. 

In both cases, plx] is now the node that was previously y’s parent. 
If y is black, we could have violations of red-black properties: 

1. OK. 

2. If y is the root and v is red, then the root has become red. 

3. OK. 

4. Violation if ply] and x are both red. 

5. Any path containing y now has 1 fewer black node. 

• Correct by giving v an “extra black.” 
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• Add 1 to count of black nodes on paths containing a:. 

• Now property 5 is OK, but property 1 is not. 

• ,r is either doubly black (if color[x ] = BLACK) or red & black (if color[x] = 
RED). 

• The attribute color[x ] is still either RED or BLACK. No new values for color 
attribute. 

• In other words, the extra blackness on a node is by virtue of x pointing to the 
node. 

Remove the violations by calling RB -Delete-Fixup: 


RB-Delete-Fixup(7\ x) 
while a: root [ T ] and color[x ] = BLACK 
do if a: = left\p\x\\ 

then w <r- right\p\x\\ 
if color [w] = RED 
then color[w\ BLACK 
color[p[xf\ RED 
Left-Rotate (T, p\x\) 
w right[p[x]\ 


> 

> 

> 

> 


if color[left[w ]] = BLACK and color\right[w ]] = BLACK 
then color[w] <- RED > 

x <r- p\x\ > 

else if color[right[w]\ = BLACK 

then color[left[w ]] 4- BLACK > 

color[w ] <— RED > 

Right-Rotate (T, w ) > 

w right\ p\x 11 C> 

color[w] <— color[p[x ]] > 

color\p\xf\ BLACK > 

color[right[w]\ <— BLACK D> 

Left-Rotate (T, p[x]) > 

a: 4- root[T ] O 

else (same as then clause with “right” and “left” exchanged) 
color[x\ 4- BLACK 


Case 1 
Case 1 
Case 1 
Case 1 

Case 2 
Case 2 

Case 3 
Case 3 
Case 3 
Case 3 
Case 4 
Case 4 
Case 4 
Case 4 
Case 4 


Idea: Move the extra black up the tree until 

• a: points to a red & black node =>• turn it into a black node, 

• a: points to the root => just remove the extra black, or 

• we can do certain rotations and recolorings and finish. 

Within the while loop: 

• a: always points to a nonroot doubly black node. 

• w is a: ’s sibling. 

• w cannot be nil[T], since that would violate property 5 at p\x\. 

There are 8 cases, 4 of which are symmetric to the other 4. As with insertion, the 
cases are not mutually exclusive. We’ll look at cases in which a: is a left child. 
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Case 1: in is red 



• in must have black children. 

• Make in black and p\x \ red. 

• Then left rotate on p\x\. 

• New sibling of x was a child of w before rotation =>• must be black. 

• Go immediately to case 2, 3, or 4. 

Case 2: in is black and both of w’s children are black 


Case 2 

.in¬ 


i' 8 E 


[Node with gray outline is of unknown color, denoted by c.] 

• Take 1 black off x (=> singly black) and off in (=> red). 

• Move that black to p\x\. 

• Do the next iteration with /?[jc] as the new x. 

• If entered this case from case 1, then p\x \ was red =^> new x is red & black 
=> color attribute of new x is RED => loop terminates. Then new x is made 
black in the last line. 

Case 3: in is black, in’s left child is red, and in’s right child is black 






• Make in red and in’s left child black. 

• Then right rotate on in. 

• New sibling in of x is black with a red right child =>■ case 4. 
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Case 4: w is black, id’s left child is black, and id’s right child is red 


Case 4 




[Now there are two nodes of unknown colors, denoted by c and c. / 


• Make w be p[.r]’s color (c). 

• Make p[.r] black and id’s right child black. 

• Then left rotate on p\x\. 

• Remove the extra black on v (=>• x is now singly black) without violating 
any red-black properties. 

• All done. Setting x to root causes the loop to terminate. 


Analysis 

0(lg n ) time to get through RB-Delete up to the call of RB-Delete-Fixup. 
Within RB-Delete-Fixup: 

• Case 2 is the only case in which more iterations occur. 

• x moves up 1 level. 

• Hence, 0(lg n) iterations. 

• Each of cases 1, 3, and 4 has 1 rotation =4 < 3 rotations in all. 

• Hence, 0(lg n) time. 

[In Chapter 14, we ’ll see a theorem that relies on red-black tree operations causing 
at most a constant number of rotations. This is where red-black trees enjoy an 
advantage over AVL trees: in the worst case, an operation on an n-node AVL tree 
causes <2 (lg n) rotations.] 




Solutions for Chapter 13: 
Red-Black Trees 


Solution to Exercise 13.1-3 

If we color the root of a relaxed red-black tree black but make no other changes, 
the resulting tree is a red-black tree. Not even any black-heights change. 


Solution to Exercise 13.1-4 

After absorbing each red node into its black parent, the degree of each node black 
node is 

• 2, if both children were already black, 

• 3, if one child was black and one was red, or 

• 4, if both children were red. 

All leaves of the resulting tree have the same depth. 


Solution to Exercise 13.1-5 

In the longest path, at least every other node is black. In the shortest path, at most 
every node is black. Since the two paths contain equal numbers of black nodes, the 
length of the longest path is at most twice the length of the shortest path. 

We can say this more precisely, as follows: 

Since every path contains bh(v) black nodes, even the shortest path from r to a 
descendant leaf has length at least bhOc). By definition, the longest path from v 
to a descendant leaf has length height (x). Since the longest path has bh(v) black 
nodes and at least half the nodes on the longest path are black (by property 4), 
bh(x) > height(v)/2, so 


length of longest path = height(jc) < 2 • bh(.i') < twice length of shortest path . 
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Solution to Exercise 13.2-4 

Since the exercise asks about binary search trees rather than the more specific red- 
black trees, we assume here that leaves are full-fledged nodes, and we ignore the 
sentinels. 

Taking the book’s hint, we start by showing that with at most n — 1 right rotations, 
we can convert any binary search tree into one that is just a right-going chain. 

The idea is simple. Let us define the right spine as the root and all descendants of 
the root that are reachable by following only right pointers from the root. A binary 
search tree that is just a right-going chain has all n nodes in the right spine. 

As long as the tree is not just a right spine, repeatedly find some node y on the right 
spine that has a non-leaf left child x and then perform a right rotation on y: 



(In the above figure, note that any of a, /3, and y can be an empty subtree.) 

Observe that this right rotation adds x to the right spine, and no other nodes leave 
the right spine. Thus, this right rotation increases the number of nodes in the right 
spine by 1. Any binary search tree starts out with at least one node—the root—in 
the right spine. Moreover, if there are any nodes not on the right spine, then at least 
one such node has a parent on the right spine. Thus, at most n — 1 right rotations 
are needed to put all nodes in the right spine, so that the tree consists of a single 
right-going chain. 

If we knew the sequence of right rotations that transforms an arbitrary binary search 
tree T to a single right-going chain T, then we could perform this sequence in 
reverse—turning each right rotation into its inverse left rotation—to transform T 
back into T. 

Therefore, here is how we can transform any binary search tree T\ into any other 
binary search tree T 2 . Let T' be the unique right-going chain consisting of the 
nodes of 7j (which is the same as the nodes of T 2 ). Let r = <n, r 2 , ..., r k ) be a 
sequence of right rotations that transforms 7] to T, and let r' = (r[, r' 2 , . .. , r' k ,) 
be a sequence of right rotations that transforms T 2 to T. We know that there exist 
sequences r and r with k,k! < n — 1. For each right rotation r[, let l\ be the 
corresponding inverse left rotation. Then the sequence (q, r 2 , ..., r k , Zb, ZL_ 1 , 

..., 1' 2 , l\) transforms 7) to T 2 in at most 2 n — 1 rotations. 



Solution to Exercise 13.3-3 

In Figure 13.5, nodes A, B, and D have black-height k +1 in all cases, because each 
of their subtrees has black-height k and a black root. Node C has black-height k + I 
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on the left (because its red children have black-height k + 1) and black-height k + 2 
on the right (because its black children have black-height k + 1). 




In Figure 13.6, nodes A, B, and C have black-height k + 1 in all cases. At left and 
in the middle, each of A’s and B’s subtrees has black-height k and a black root, 
while C has one such subtree and a red child with black-height k + 1. At the right, 
each of A’s and C’s subtrees has black-height k and a black root, while B's red 
children each have black-height k + 1. 



Property 5 is preserved by the transformations. We have shown above that the 
black-height is well-defined within the subtrees pictured, so property 5 is preserved 
within those subtrees. Property 5 is preserved for the tree containing the subtrees 
pictured, because every path through these subtrees to a leaf contributes k + 2 black 
nodes. 


Solution to Exercise 13.3-4 

Colors are set to red only in cases 1 and 3, and in both situations, it is p[p[z\\ that 
is reddened. If p\p[z\\ is the sentinel, then p[z\ is the root. By part (b) of the 
loop invariant and line 1 of RB-Insert-Fixup, if p\z \ is the root, then we have 
dropped out of the loop. The only subtlety is in case 2, where we set z <— p\ z \ 
before coloring p[p[z\\ red. Because we rotate before the recoloring, the identity 
of p[p[z\\ is the same before and after case 2, so there’s no problem. 
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Solution to Exercise 13.4-6 

Case 1 occurs only if x’s sibling w is red. If p\x\ were red, then there would be 
two reds in a row, namely p[v] (which is also p\w ]) and w, and we would have 
had these two reds in a row even before calling RB -Delete. 


Solution to Exercise 13.4-7 


No, the red-black tree will not necessarily be the same. Here are two examples: 
one in which the tree’s shape changes, and one in which the shape remains the 
same but the node colors change. 



Solution to Problem 13-1 

a. When inserting key k, all nodes on the path from the root to the added node 
(a new leaf) must change, since the need for a new child pointer propagates up 
from the new node to all of its ancestors. 

When deleting a node, let y be the node actually removed and z be the node 
given to the delete procedure. 

• If y has at most one child, it will be removed or spliced out (see Figure 12.4, 
parts (a) and (b), where y and z are the same node). All ancestors of y will 
be changed. (As with insertion, the need for a new child pointer propagates 
up from the removed node.) 

• If z has two children, y is its successor; it is y that will be spliced out and 
moved to z’s position (see Figure 12.4(c)). Therefore all ancestors of both z 
and y must be changed. (Actually, this is just all ancestors of z, since z is an 
ancestor of y in this case.) 

In either case, y’s children (if any) are unchanged, because we have assumed 
that there is no parent field. 
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b. We assume that we can call two procedures: 

• Make-New-Node(^) creates a new node whose key field has value k and 
with left and right fields NIL, and it returns a pointer to the new node. 

• Copy-Node(a) creates a new node whose key, left, and right fields have the 
same values as those of node a, and it returns a pointer to the new node. 

Here are two ways to write Persistent-Tree-Insert. The first is a version 
of Tree-Insert, modified to create new nodes along the path to where the 
new node will go, and to not use parent fields. It returns the root of the new 
tree. 

Persistent-Tree-Insert (T, k) 
z < r - Make-New-Node(&) 
new-root <— COPY-NODE(root[T]) 
y < r - NIL 
v new-root 
while x 7 ^ NIL 
do y x 

if key\z\ < key[x] 
then v COPY-NODE(/e/t[x]) 
leftly] x 

else x Copy-Node( n'g/it[x]) 

rightly ] x 

if y = nil 
then new-root z 
else if key[z] < keyly) 
then left\y\ z 
else rightly ] <— z 
return new-root 

The second is a rather elegant recursive procedure. It must be called with 
roof T | instead of T as its first argument (because the recursive calls pass a 
node for this argument), and it returns the root of the new tree. 

Persistent-Tree-Insert (r, k) 
if r = nil 

then a Make-New-Node(£) 

else a Copy-Node( r) 

if k < key\r \ 

then left\x\ Persistent-Tree-Insert (left[r], k) 
else right[x\ Persistent-Tree-Insert (right[r], k) 

return a 

c . Like Tree-Insert, Persistent-Tree-Insert does a constant amount of 
work at each node along the path from the root to the new node. Since the 
length of the path is at most h, it takes 0(h) time. 

Since it allocates a new node (a constant amount of space) for each ancestor of 
the inserted node, it also needs 0(h) space. 
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d. If there were parent fields, then because of the new root, every node of the tree 
would have to be copied when a new node is inserted. To see why, observe 
that the children of the root would change to point to the new root, then their 
children would change to point to them, and so on. Since there are n nodes, this 
change would cause insertion to create Q.(n) new nodes and to take <2( n) time. 

e. From parts (a) and (c), we know that insertion into a persistent binary search 
tree of height h, like insertion into an ordinary binary search tree, takes worst- 
case time 0(h). A red-black tree has h = 0 (lg n ), so insertion into an ordinary 
red-black tree takes 0(\gn) time. We need to show that if the red-black tree is 
persistent, insertion can still be done in 0(\gn) time. To do this, we will need 
to show two things: 

• How to still find the parent pointers we need in 0(1) time without using 
a parent field. We cannot use a parent field because a persistent tree with 
parent fields uses <2 (n) time for insertion (by part (d)). 

• That the additional node changes made during red-black tree operations (by 
rotation and recoloring) don’t cause more than 0(\gn) additional nodes to 
change. 

Each parent pointer needed during insertion can be found in 0(1) time without 
having a parent field as follows: 

To insert into a red-black tree, we call RB-Insert, which in turn calls RB- 
Insert-Fixup. Make the same changes to RB-Insert as we made to Tree- 
Insert for persistence. Additionally, as RB-Insert walks down the tree to 
find the place to insert the new node, have it build a stack of the nodes it tra¬ 
verses and pass this stack to RB-Insert-Fixup. RB-Insert-Fixup needs 
parent pointers to walk back up the same path, and at any given time it needs 
parent pointers only to find the parent and grandparent of the node it is working 
on. As RB-Insert-Fixup moves up the stack of parents, it needs only parent 
pointers that are at known locations a constant distance away in the stack. Thus, 
the parent information can be found in 0(1) time, just as if it were stored in a 
parent field. 

Rotation and recoloring change nodes as follows: 

• RB-Insert-Fixup performs at most 2 rotations, and each rotation changes 
the child pointers in 3 nodes (the node around which we rotate, that node’s 
parent, and one of the children of the node around which we rotate). Thus, at 
most 6 nodes are directly modified by rotation during RB-Insert-Fixup. In 
a persistent tree, all ancestors of a changed node are copied, so RB-Insert- 
Fixup’s rotations take 0(\gn) time to change nodes due to rotation. (Ac¬ 
tually, the changed nodes in this case share a single 0(\gn )-length path of 
ancestors.) 

• RB-Insert-Fixup recolors some of the inserted node’s ancestors, which 
are being changed anyway in persistent insertion, and some children of an¬ 
cestors (the “uncles” referred to in the algorithm description). There are 
at most 0(lg n) ancestors, hence at most 0(\gn) color changes of uncles. 
Recoloring uncles doesn’t cause any additional node changes due to persis¬ 
tence, because the ancestors of the uncles are the same nodes (ancestors of 
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the inserted node) that are being changed anyway due to persistence. Thus, 
recoloring does not affect the 0 (lg n) running time, even with persistence. 

We could show similarly that deletion in a persistent tree also takes worst-case 

time 0(h). 

• We already saw in part (a) that 0(h) nodes change. 

• We could write a persistent RB -Delete procedure that runs in 0(h) time, 
analogous to the changes we made for persistence in insertion. But to do so 
without using parent pointers we need to walk down the tree to the node to be 
deleted, to build up a stack of parents as discussed above for insertion. This 
is a little tricky if the set’s keys are not distinct, because in order to find the 
path to the node to delete—a particular node with a given key—we have to 
make some changes to how we store things in the tree, so that duplicate keys 
can be distinguished. The easiest way is to have each key take a second part 
that is unique, and to use this second part as a tiebreaker when comparing 
keys. 

Then the problem of showing that deletion needs only 0(lg n) time in a persis¬ 
tent red-black tree is the same as for insertion. 

• As for insertion, we can show that the parents needed by RB -Delete- 
Fixup can be found in 0(1) time (using the same technique as for insertion). 

• Also, RB-Delete-Fixup performs at most 3 rotations, which as discussed 
above for insertion requires 0(lg n) time to change nodes due to persistence. 
It also does O (lg n) color changes, which (as for insertion) take only O (lg n) 
time to change ancestors due to persistence, because the number of copied 
nodes is 0(lgn). 




Lecture Notes for Chapter 14: 
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Chapter 14 overview 

We’ll be looking at methods for designing algorithms. In some cases, the design 
will be intermixed with analysis. In other cases, the analysis is easy, and it’s the 
design that’s harder. 


Augmenting data structures 

• It’s unusual to have to design an all-new data structure from scratch. 

• It’s more common to take a data structure that you know and store additional 
information in it. 

• With the new information, the data structure can support new operations. 

• But... you have to figure out how to correctly maintain the new information 
without loss of efficiency. 

We’ll look at a couple of situations in which we augment red-black trees. 


Dynamic order statistics 

We want to support the usual dynamic-set operations from R-B trees, plus: 

• OS -Select (x, if: return pointer to node containing the ith smallest key of the 
subtree rooted at x. 

• OS-Rank(7\ x): return the rank of x in the linear order determined by an 
inorder walk of T. 

Augment by storing in each node x : 

size\x\ = # of nodes in subtree rooted at x . 

• Includes x itself. 

• Does not include leaves (sentinels). 

Define for sentinel size\nil\T\ \ = 0. 

Then siz,e\x \ = siz.e\left\x\ \ + siz,e\right\x\\ + 1. 
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[ Example above: Ignore colors, but legal coloring shown with “R” and “B" nota¬ 
tions. Values of i and r are for the example below.] 

Note: OK for keys to not be distinct. Rank is defined with respect to position in 
inorder walk. So if we changed D to C, rank of original C is 2, rank of D changed 
to C is 3. 

OS -Select( he, i) 
r <— size[left[x]\+1 
if i = r 

then return x 
elseif i < r 

then return OS-Select(/c//[x |, i) 
else return OS-SELECT(ng/zt[x], i — r) 

Initial call: OS-SELECT(root[T], i) 

Try OS-Select (root[T], 5). [Values shown in figure above. Returns node whose 
key is H.] 

Correctness: r = rank of x within subtree rooted at x. 

• If i = r, then we want x. 

• If i < r, then /th smallest element is in x’s left subtree, and we want the ith 
smallest element in the subtree. 

• If i > r, then ith smallest element is in x’s right subtree, but subtract off the r 
elements in x’s subtree that precede those in x’s right subtree. 

• Like the randomized Select algorithm! 

Analysis: Each recursive call goes down one level. Since R-B tree has Odg n) 
levels, have O(lgn) calls =>• O(lgn) time. 

OS-Rank(T, x) 
r <— size[left[x\\ + 1 

y x 

while y / root[T ] 

do if y = right[p[y]\ 

then r r + size[left[p[y]]\ + 1 

y ply] 

return r 
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Demo: Node D. 

Why does this work? 

Loop invariant: At start of each iteration of while loop, r = rank of key\x \ 
in subtree rooted at y. 

Initialization: Initially, r = rank of key\x \ in subtree rooted at x, and y = x. 

Termination: Loop terminates when y = root[T ] => subtree rooted at y is entire 
tree. Therefore, r = rank of key\x \ in entire tree. 

Maintenance: At end of each iteration, set y <— p [ y |. So, show that if r = rank 
of key\x \ in subtree rooted at y at start of loop body, then r = rank of key\x\ in 
subtree rooted at p[y\ at end of loop body. 



[r = # of nodes in subtree rooted at y preceding x in inorder walk] 

Must add nodes in y’s sibling’s subtree. 

• If y is a left child, its sibling’s subtree follows all nodes in y’s subtree => 
don’t change r. 

• If y is a right child, all nodes in y’s sibling’s subtree precede all nodes in y’s 
subtree =>• add size of y’s sibling’s subtree, plus 1 for p [y |, into r. 



Analysis: y goes up one level in each iteration =y 0{\gn) time. 


Maintaining subtree sizes 

• Need to maintain size\x] fields during insert and delete operations. 

• Need to maintain them efficiently. Otherwise, might have to recompute them 
all, at a cost of Q(n). 

Will see how to maintain without increasing 0(\gn) time for insert and delete. 

Insert: 

• During pass downward, we know that the new node will be a descendant of 
each node we visit, and only of these nodes. Therefore, increment size field of 
each node visited. 
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• Then there’s the fixup pass: 

• Goes up the tree. 

• Changes colors O(lgn) times. 

• Performs < 2 rotations. 

• Color changes don’t affect subtree sizes. 

• Rotations do! 

• But we can determine new sizes based on old sizes and sizes of children. 



LEFT-ROTATE(r, X) 



size[y] <r- size\x \ 

siz,e[x\ size\left\x\\ + size\right\x\\ + 1 

• Similar for right rotation. 

• Therefore, can update in 0(1) time per rotation => 0(1) time spent updating 
size fields during fixup. 

• Therefore, O(lgn) to insert. 

Delete: Also 2 phases: 

1. Splice out some node y. 

2. Fixup. 

After splicing out y, traverse a path v root, decrementing size in each node on 
path. 0(lgn)time. 

During fixup, like insertion, only color changes and rotations. 

• < 3 rotations =>0(1) time spent updating size fields during fixup. 

• Therefore, 0(lg n) to delete. 

Done! 


Methodology for augmenting a data structure 

1. Choose an underlying data structure. 

2. Determine additional information to maintain. 
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3. Verify that we can maintain additional information for existing data structure 
operations. 

4. Develop new operations. 

Don’t need to do these steps in strict order! Usually do a little of each, in parallel. 
How did we do them for OS trees? 

1. R-B tree. 

2. size\x |. 

3. Showed how to maintain size during insert and delete. 

4. Developed OS-Select and OS-Rank. 

Red-black trees are particularly amenable to augmentation. 

Theorem 

Augment a R-B tree with field /, where f[x] depends only on information in x, 
left\x\, and right[x] (including f\left\x \ | and f\right[x ]]). Then can maintain 
values of / in all nodes during insert and delete without affecting 0(lg n) perfor¬ 
mance. 

Proof Since f[x | depends only on x and its children, when we alter information 
in x, changes propagate only upward (to /;[x |, p[p[x]], ... , root). 

Height = O(lgn) =4 0(lgn) updates, at 0(1) each. 

Insertion: Insert a node as child of existing node. Even if can’t update / on way 
down, can go up from inserted node to update /. During fixup, only changes come 
from color changes (no effect on /) and rotations. Each rotation affects / of < 3 
nodes (x,y, and parent), and can recompute each in 0(1) time. Then, if necessary, 
propagate changes up the tree. Therefore, 0(lg n) time per rotation. Since < 2 
rotations, OOg n) time to update / during fixup. 

Delete: Same idea. After splicing out a node, go up from there to update /. Fixup 
has < 3 rotations. Ofig n) per rotation => 0(lg n) to update / during fixup. 

■ (theorem) 

For some attributes, can get away with 0(1) per rotation. Example: size field. 


Interval trees 

Maintain a set of intervals. For instance, time intervals. 

low[i] = 1 high[t] = 10 

(=[7,10] 

7 I-1 10 

51-ill 171-19 

41-8 151-118 211-123 


[leave on board] 
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Operations 

• Interval-Insert (T, x): int\x ] already filled in. 

• Interval-Delete ( 7 \ x) 

• Interval-Search (T, /): return pointer to a node x in T such that int\x\ over¬ 
laps interval i. Any overlapping node in T is OK. Return pointer to sentinel 
nil\ T | if no overlapping node in T. 

Interval i has Zow[/], high\i |. 

i and j overlap if and only if low[i ] < high[j ] and /mr[/1 < high\i\. 

(Go through examples of proper inclusion, overlap without proper inclusion, no 
overlap.) 

Another way: i and j don’t overlap if and only if: low[i ] > high] j | or low[j] > 
high\i\. [leave this on board] 

Recall the 4 -part methodology. 


For interval trees 

1. Use R-B trees. 

• Each node x contains interval int\x\. 

• Key is low endpoint (low\int\x\\). 

• Inorder walk would list intervals sorted by low endpoint. 

2. Each node x contains 

max[x ] = max endpoint value in subtree rooted at a . 



[leave on board] 


max[x ] = max 


high\int\x\\ , 
max\left[x\\ , 
max[right[x]\ 


Could max\left\x\\ > max\right]x 11? Sure. Position in tree is determined only 
by low endpoints, not high endpoints. 

3. Maintaining the information. 







Lecture Notes for Chapter 14: Augmenting Data Structures 


14-7 


• This is easy —max[.x] depends only on: 

• information in x: high\int\x\\ 

• information in left[x]: max\left\x\\ 

• information in right [x ]: max [right [x ] ] 

• Apply the theorem. 

• In fact, can update max on way down during insertion, and in 0(1) time per 
rotation. 

4. Developing new operations. 

Interval-Search (T, i) 
x <— root[T ] 

while x f nil\ T | and i does not overlap int[x \ 

do if left[x] f nil\T | and max\left\x\\ > low[i] 
then v <r- left[x] 
else x right[x ] 

return x 

Examples: Search for [14, 16] and [12, 14]. 

Time: 0(lg«). 

Correctness: Key idea: need check only 1 of node’s 2 children. 

Theorem 

If search goes right, then either: 

• There is an overlap in right subtree, or 

• There is no overlap in either subtree. 

If search goes left, then either: 

• There is an overlap in left subtree, or 

• There is no overlap in either subtree. 

Proof If search goes right: 

• If there is an overlap in right subtree, done. 

• If there is no overlap in right, show there is no overlap in left. Went right 
because 

• left[x] = nil[T] => no overlap in left. 

OR 

• max[left[x]\ < low[i ] => no overlap in left. 


/ 

max[left[x] \ = highest endpoint in left 
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If search goes left: 

• If there is an overlap in left subtree, done. 

• If there is no overlap in left, show there is no overlap in right. 

• Went left because: 
low\i] < max\left\x\\ 

= higli\ j | for some j in left subtree . 

• Since there is no overlap in left, i and j don’t overlap. 

• Refer back to: no overlap if 

low[i\ > high[j ] or low[j ] > high[i\ . 

• Since low[i ] < high\j\, must have low[j ] > high\i\. 

• Now consider any interval k in right subtree. 

• Because keys are low endpoint, 
low\j | < low[k] . 

in left in right 

• Therefore, high\i | < low[j] < low\k\. 

• Therefore, high\i | < low[k ]. 

• Therefore, i and k do not overlap. 


■ (theorem) 
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Solution to Exercise 14.1-5 

Given an element x in an /i-node order-statistic tree T and a natural number i, the 
following procedure retrieves the i th successor of x in the linear order of T : 

OS-Successor( 7\ v, i) 
r OS-Rank(7\ x) 
s r + i 

return OS-SELECT(root[T], s) 

Since OS-Rank and OS-Select each take 0(lgn) time, so does the procedure 
OS-Successor. 


Solution to Exercise 14.1-6 

When inserting node z, we search down the tree for the proper place for z. For each 
node x on this path, add 1 to mnk\x\ if y is inserted within x’s left subtree, and 
leave rank[x ] unchanged if y is inserted within x’s right subtree. Similarly when 
deleting, subtract 1 from mnk\x\ whenever the spliced-out node y had been in x’s 
left subtree. 

We also need to handle the rotations that occur during the fixup procedures for 
insertion and deletion. Consider a left rotation on node x, where the pre-rotation 
right child of x is y (so that x becomes y’s left child after the left rotation). We 
leave rank[x\ unchanged, and letting r = rank [ y | before the rotation, we set 
rank[y] r + rank\x\. Right rotations are handled in an analogous manner. 


Solution to Exercise 14.1-7 

Let A[ 1 .. n\ be the array of n distinct numbers. 

One way to count the inversions is to add up, for each element, the number of larger 
elements that precede it in the array: 
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# of inversions = E \Inv(j) | , 

7 = 1 

where Inv(j) = {i : i < j and A\i | > A[j]}. 

Note that \Inv(j)\ is related to A[j]’s rank in the subarTay A[1.. j ] because the 
elements in Inv(j) are the reason that A\j\ is not positioned according to its rank. 
Let r(j ) be the rank of A[j ] in A[1 .. j 1. Then j = r(j ) + \Inv(j)\, so we can 
compute 

|/wv(j) | = j — r(j) 

by inserting A[l],..., A [ n\ into an order-statistic tree and using OS-Rank to find 
the rank of each A[j ] in the tree immediately after it is inserted into the tree. (This 
OS-Rank value is r(j).) 

Insertion and OS-Rank each take 0(lgn) time, and so the total time for n ele¬ 
ments is C)(n lg n). 


Solution to Exercise 14.2-2 

Yes, by Theorem 14.1, because the black-height of a node can be computed from 
the information at the node and its two children. Actually, the black-height can 
be computed from just one child’s information: the black-height of a node is the 
black-height of a red child, or the black height of a black child plus one. The 
second child does not need to be checked because of property 5 of red-black trees. 

Within the RB-Insert-Fixup and RB-Delete-Fixup procedures are color 
changes, each of which potentially cause Oflg n) black-height changes. Let us 
show that the color changes of the fixup procedures cause only local black-height 
changes and thus are constant-time operations. Assume that the black-height of 
each node x is kept in the field bh\x\. 

For RB-Insert-Fixup, there are 3 cases to examine. 

Case 1: z’s uncle is red. 
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• Before color changes, suppose that all subtrees a, /?, y, 8, e have the same 
black-height k with a black root, so that nodes A, B, C, and D have black- 
heights of k + 1. 

• After color changes, the only node whose black-height changed is node C. 
To fix that, add bh\p\p[z]]] = bh[p\p[z\W + 1 after line 7 in RB-Insert- 
Fixup. 

• Since the number of black nodes between p[p[z\\ and z remains the same, 
nodes above p[p[z\\ are not affected by the color change. 

Case 2: z's uncle y is black, and z is a right child. 

Case 3: z n s uncle y is black, and z is a left child. 



• With subtrees a, /3, y, 8, e of black-height k, we see that even with color 
changes and rotations, the black-heights of nodes A, B, and C remain the 
same (k + 1). 

Thus, RB-Insert-Fixup maintains its original 0(\g n) time. 

For RB-Delete-Fixup, there are 4 cases to examine. 

Case 1: x’s sibling w is red. 



• Even though case 1 changes colors of nodes and does a rotation, black- 
heights are not changed. 

• Case 1 changes the structure of the tree, but waits for cases 2, 3, and 4 to 
deal with the “extra black” on x. 

Case 2: x’s sibling w is black, and both of w ’s children are black. 
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• w is colored red, and x’s “extra” black is moved up to p[x\. 

• Now we can add bh[p[x]] = bh[x ] after line 10 in RB-Delete-Fixup. 

• This is a constant-time update. Then, keep looping to deal with the extra 
black on p[x |. 

Case 3: x’s sibling w is black, w’s left child is red, and w’s right child is black. 




• Regardless of the color changes and rotation of this case, the black-heights 
don’t change. 

• Case 3 just sets up the structure of the tree, so it can fall correctly into case 4. 
Case 4: x’s sibling w is black, and w’s right child is red. 



• Nodes A, C, and E keep the same subtrees, so their black-heights don’t 
change. 

• Add these two constant-time assignments in RB-Delete-Fixup after 
line 20: 

bh[p[x]\ = bh[x ] + 1. 
bh[p[p[x]]\ = bh[p[x]] + 1. 

• The extra black is taken care of. Loop terminates. 

Thus, RB-Delete-Fixup maintains its original 0(\gn) time. 

Therefore, we conclude that black-heights of nodes can be maintained as fields 
in red-black trees without affecting the asymptotic performance of red-black tree 
operations. 


Solution to Exercise 14.2-3 

No, because the depth of a node depends on the depth of its parent. When the depth 
of a node changes, the depths of all nodes below it in the tree must be updated. 
Updating the root node causes n — 1 other nodes to be updated, which would mean 
that operations on the tree that change node depths might not run in 0(n lg n) time. 
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Solution to Exercise 14.3-3 

As it travels down the tree, Interval-Search first checks whether current node x 
overlaps the query interval i and, if it does not, goes down to either the left or right 
child. If node x overlaps i, and some node in the right subtree overlaps i, but 
no node in the left subtree overlaps i, then because the keys are low endpoints, 
this order of checking (first x, then one child) will return the overlapping interval 
with the minimum low endpoint. On the other hand, if there is an interval that 
overlaps i in the left subtree of x, then checking x before the left subtree might 
cause the procedure to return an interval whose low endpoint is not the minimum 
of those that overlap i. Therefore, if there is a possibility that the left subtree might 
contain an interval that overlaps i , we need to check the left subtree first. If there is 
no overlap in the left subtree but node x overlaps i, then we return x. We check the 
right subtree under the same conditions as in Interval-Search: the left subtree 
cannot contain an interval that overlaps i , and node x does not overlap i , either. 

Because we might search the left subtree first, it is easier to write the pseudocode to 
use a recursive procedure Min-Interval-Search-From (T, x,i), which returns 
the node overlapping i with the minimum low endpoint in the subtree rooted at x, 
or nil\ T | if there is no such node. 

Min-Interval-Search(7\ i) 

return Min-Interval-Search-From (T, root[T], i ) 

Min-Interval-Search-From (T, x, i) 
if left\x\ nil[T ] and max\left\x\\ > low[i] 
then y Min-Interval-Search-From (T, left[x\, i) 

if y / nil\T \ 

then return y 
elseif / overlaps int\x\ 
then return x 
else return nil\ T | 
elseif i overlaps int\x\ 
then return x 

else return Min-Interval-Search-From (T, right\x\, i ) 

The call Min-Interval-Search(T, i) takes 0(lgn) time, since each recursive 
call of Min-Interval-Search-From goes one node lower in the tree, and the 
height of the tree is 0(lg n). 


Solution to Exercise 14.3-6 

1. Underlying data structure: 

A red-black tree in which the numbers in the set are stored simply as the keys 
of the nodes. 
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Search is then just the ordinary Tree-Search for binary search trees, which 
runs in 0( lg n) time on red-black trees. 

2. Additional information: 

The red-black tree is augmented by the following fields in each node x : 


• min-gap[x] contains the minimum gap in the subtree rooted at x. It has the 
magnitude of the difference of the two closest numbers in the subtree rooted 
at a. If a is a leaf (its children are all nil\ T ]), let min-gap[x ] = oo. 

• min-val\x\ contains the minimum value (key) in the subtree rooted at x. 

• max-val\x\ contains the maximum value (key) in the subtree rooted at x. 


3. Maintaining the information: 

The three fields added to the tree can each be computed from information in the 
node and its children. Hence by Theorem 14.1, they can be maintained during 
insertion and deletion without affecting the 0(lg n) running time: 


min-val [x\ 


{ min-val\left\x\ | if there’s a left subtree , 
key\x | otherwise , 


max-val\x | = 


{ max- val\right [x]| 
key[x] 


if there’s a right subtree , 
otherwise , 


min-gap[x] = min 


min-gap\left\x\ | 
min-gap [right [x ]] 
key\x | — max-val\left\x\\ 
min-val[right[x ]] — key\x \ 


(oo if no left subtree) , 
(oo if no right subtree) , 
(oo if no left subtree) , 
(oo if no right subtree) . 


In fact, the reason for defining the min-val and max-val fields is to make it 
possible to compute min-gap from information at the node and its children. 

4. New operation: 

Min-Gap simply returns the min-gap stored at the tree root. Thus, its running 
time is 0(1). 

Note that in addition (not asked for in the exercise), it is possible to find the 
two closest numbers in 0(lg n) time. Stalling from the root, look for where the 
minimum gap (the one stored at the root) came from. At each node x, simulate 
the computation of min-gap[x ] to figure out where min-gap[x ] came from. If 
it came from a subtree’s min-gap field, continue the search in that subtree. If 
it came from a computation with x’s key, then x and that other number are the 
closest numbers. 


Solution to Exercise 14.3-7 

General idea: Move a sweep line from left to right, while maintaining the set of 
rectangles currently intersected by the line in an interval tree. The interval tree 
will organize all rectangles whose x interval includes the current position of the 
sweep line, and it will be based on the y intervals of the rectangles, so that any 
overlapping y intervals in the interval tree correspond to overlapping rectangles. 
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Details: 

1. Sort the rectangles by their x -coordinates. (Actually, each rectangle must ap¬ 
peal - twice in the sorted list—once for its left x -coordinate and once for its right 
x-coordinate.) 

2. Scan the sorted list (from lowest to highest x -coordinate). 

• When an x-coordinate of a left edge is found, check whether the rectangle’s 
y-coordinate interval overlaps an interval in the tree, and insert the rectangle 
(keyed on its y-coordinate interval) into the tree. 

• When an x-coordinate of a right edge is found, delete the rectangle from the 
interval tree. 

The interval tree always contains the set of “open” rectangles intersected by the 
sweep line. If an overlap is ever found in the interval tree, there are overlapping 
rectangles. 

Time: Oin lg n) 

• 0(n lg n) to sort the rectangles (we can use merge sort or heap sort). 

• Oin lg n) for interval-tree operations (insert, delete, and check for overlap). 


Solution to Problem 14-1 

a. Assume for the puipose of contradiction that there is no point of maximum 
overlap in an endpoint of a segment. The maximum overlap point p is in the 
interior of m segments. Actually, p is in the interior of the intersection of those 
m segments. Now look at one of the endpoints // of the intersection of the m 
segments. Point p' has the same overlap as p because it is in the same intersec¬ 
tion of m segments, and so p' is also a point of maximum overlap. Moreover, p 
is in the endpoint of a segment (otherwise the intersection would not end there), 
which contradicts our assumption that there is no point of maximum overlap in 
an endpoint of a segment. Thus, there is always a point of maximum overlap 
which is an endpoint of one of the segments. 

b. Keep a balanced binary tree of the endpoints. That is, to insert an interval, 
we insert its endpoints separately. With each left endpoint e, associate a value 
p[e ] = +1 (increasing the overlap by 1). With each right endpoint e associate a 
value p\e\ = —1 (decreasing the overlap by 1). When multiple endpoints have 
the same value, insert all the left endpoints with that value before inserting any 
of the right endpoints with that value. 

Here’s some intuition. Let e\, ei, e„ be the sorted sequence of endpoints 
corresponding to our intervals. Let s(i, j ) denote the sum p \q | + p\e l+ \ \ + 

■ ■ ■ + p\ej\ for I </</< /?. We wish to find an i maximizing v(I, i ). 

Each node x stores three new attributes. Suppose that the subtree rooted at x 
includes the endpoints ei [ x ], ..., e r [ X ]. We store v[x] = .s (/[w |, r[x |), the sum of 
the values of all nodes in x’s subtree. We also store m\x\, the maximum value 
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obtained by the expression s(/[x], i) for any i in{/[x], / [x ] + 1, ..., r[x]}. Fi¬ 
nally, we store o\x | as the value of i for which m[x | achieves its maximum. For 
the sentinel, we define v\nil[T]\ = m[nil[T]\ = 0. 

We can compute these attributes in a bottom-up fashion to satisfy the require¬ 
ments of Theorem 14.1: 

v[x ] = v\left\x 11 + p|x \ + v\right]x | ] , 


m\x \ = max 


m[left[x 11 (max is in x’s left subtree) , 

v[left[x | ] + p\x\ (max is at x) . 

v[left[x]] + p\x\ + m\right\x\ | (max is in x’s right subtree) . 


The computation of i>[x| is straightforward. The computation of m\x\ bears 
further explanation. Recall that it is the maximum value of the sum of the 
p values for the nodes in x’s subtree, starting at /[x], which is the leftmost 
endpoint in x’s subtree and ending at any node i in x’s subtree. The value 
of i that maximizes this sum is either a node in x’s left subtree, x itself, or 
a node in x’s right subtree. If i is a node in x’s left subtree, then m [ left [x11 
represents a sum starting at /[x], and hence m[x] = m\left\x\\. If i is x itself, 
then m\x\ represents the sum of all p values in x’s left subtree plus p\x\, so 
that m[x] = v\left\x \ \ + p\x\. Finally, if i is in x’s right subtree, then m[x] 
represents the sum of all p values in x’s left subtree, plus p[.x], plus the sum 
of some set of p values in x’s right subtree. Moreover, the values taken from 
x’s right subtree must start from the leftmost endpoint in the right subtree. To 
maximize this sum, we need to maximize the sum from the right subtree, and 
that value is precisely m[right[x]\. Hence, in this case, rn\x \ = v[left[x \ ] + 
p[x\ + m[right[x]\. 

Once we understand how to compute m[x], it is straightforward to compute 
o[x] from the information in x and its two children. Thus, we can implement 
the operations as follows: 


• Interval-Insert: insert two nodes, one for each endpoint of the interval. 

• Interval-Delete: delete the two nodes representing the interval end¬ 
points. 

• Find-POM: return the interval whose endpoint is represented by o\root\T | ]. 


Because of how we have defined the new attributes, Theorem 14.1 says that 
each operation runs in 0(lg n) time. In fact, Find-POM takes only 0(1) time. 


Solution to Problem 14-2 

a. We use a circular list in which each element has two fields, key and next. At 
the beginning, we initialize the list to contain the keys 1, 2, ..., n in that order. 
This initialization takes O(n) time, since there is only a constant amount of 
work per element (i.e., setting its key and its next fields). We make the list 
circular by letting the next field of the last element point to the first element. 

We then start scanning the list from the beginning. We output and then delete 
every mth element, until the list becomes empty. The output sequence is the 
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(, n , m)-Josephus permutation. This process takes 0(m ) time per element, for a 
total time of 0(mn). Since m is a constant, we get O(mn) = 0(n ) time, as 
required. 

b. We can use an order-statistic tree, straight out of Section 14.1. Why? Suppose 
that we are at a particular - spot in the permutation, and let’s say that it’s the / th 
largest remaining person. Suppose that there are k < n people remaining. Then 
we will remove person j , decrement k to reflect having removed this person, 
and then go on to the (j +m — l)th largest remaining person (subtract 1 because 
we have just removed the /th largest). But that assumes that j + m < k. If not, 
then we use a little modular arithmetic, as shown below. 

In detail, we use an order-statistic tree T, and we call the procedures OS- 
Insert, OS-Delete, OS-Rank, and OS-Select: 

Josephus («, m) 
initialize T to be empty 

for j 1 to n 

do create a node x with key\x\ = j 
OS-Insert(T, x) 

k <- n 
j m 

while k > 2 

dor ^ O S - S ELECT (roof [T], j) 
print key[x] 

OS -Delete (T, x) 
k k — 1 

j <r- (( j + m — 2) mod k) + 1 
print key[OS -Select (root[T], 1)] 

The above procedure is easier to understand. Here’s a streamlined version: 

Josephus («, m) 
initialize T to be empty 

for j 1 to n 

do create a node re with key\x \ = j 
OS-Insert(T, x) 

j 1 

for k <— n downto 1 

do j ((j + m — 2) mod k) + 1 
v O S - S ELECT (roof [T], j ) 

print key[x] 

OS -Delete (T, x) 

Either way, it takes ()(n Ig n ) time to build up the order-statistic tree T, and 
then we make O(n) calls to the order-statistic-tree procedures, each of which 
takes 0(lg n) time. Thus, the total time is 0(n lg n). 
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Dynamic Programming 

• Not a specific algorithm, but a technique (like divide-and-conquer). 

• Developed back in the day when “programming” meant “tabular method” (like 
lineal - programming). Doesn’t really refer to computer programming. 

• Used for optimization problems: 

• Find a solution with the optimal value. 

• Minimization or maximization. (We’ll see both.) 

Four-step method 

1. Characterize the structure of an optimal solution. 

2. Recursively define the value of an optimal solution. 

3. Compute the value of an optimal solution in a bottom-up fashion. 

4. Construct an optimal solution from computed information. 


Assembly-line scheduling 

A simple dynamic-programming example. Actually, solvable by a graph algorithm 
that we’ll see later in the course. But a good warm-up for dynamic programming. 

[New in the second edition of the book.] 
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Su Sl,2 ^ 1,3 5 1,4 ^ 1,5 



Automobile factory with two assembly lines. 

• Each line has n stations: 5i,i,_ S i,„ and 52,i, ..., 52, 

• Corresponding stations S\j and 52,/ perform the same function but can take 
different amounts of time a\j and a 2 j. 

• Entry times e\ and e 2 . 

• Exit times x\ and x 2 . 

• After going through a station, can either 

• stay on same line; no cost, or 

• transfer to other line; cost after 5,, / is tjj . ( / = I— I. No t i n , because 
the assembly line is done after 5, „.) 

Problem: Given all these costs (time = cost), what stations should be chosen from 
line 1 and from line 2 for fastest way through factory? 

Try all possibilities? 

• Each candidate is fully specified by which stations from line 1 are included. 
Looking for a subset of line 1 stations. 

• Line 1 has n stations. 

• 2" subsets. 

• Infeasible when n is large. 

Structure of an optimal solution 

Think about fastest way from entry through 5j j. 

• If j = 1, easy: just determine how long it takes to get through 5f,i- 

• If j > 2, have two choices of how to get to 5i /: 

• Through 5ij_i, then directly to .Sj. 

• Through 5 2 j_i, then transfer over to .Sj /. 

Suppose fastest way is through 5ij_i. 
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Key observation: We must have taken a fastest way from entry through Sij -1 in 
this solution. If there were a faster way through we would use it instead to 

come up with a faster way through S\j. 

Now suppose a fastest way is through Sfaj-i. Again, we must have taken a fastest 
way through S 2 j-i- Otherwise use some faster way through $ 2 j-\ to give a faster 
way through S\j 

Generally: An optimal solution to a problem (fastest way through S\j) contains 
within it an optimal solution to subproblems (fastest way through or S 2 j-i)- 

This is optimal substructure. 

Use optimal substructure to construct optimal solution to problem from optimal 
solutions to subproblems. 

Fastest way through S\j is either 

• fastest way through Sij_i then directly through Sij, or 

• fastest way through S 2 j- i, transfer from line 2 to line 1, then through S\j. 

Symmetrically: 

Fastest way through S 2 , ; is either 

• fastest way through S 2 j- i then directly through S 2 j, or 

• fastest way through Sij-i, transfer from line 1 to line 2, then thi'ough Sij. 

Therefore, to solve problems of finding a fastest way through .Si. ; and S 2 j, solve 
subproblems of finding a fastest way through and .S' 2 j-i- 


Recursive solution 

Let fj\ j | = fastest time to get through Sjj, i = 1,2 and j = I...., n. 

Goal: fastest time to get all the way through = f*. 
f* = min {fi\n \ + x,, / 2 \n \ + x 2 ) 

/i[l] = e\+ a\\ 

/ 2 [1] = e 2 +fl 2 ,i 

For j = 2,... ,n: 

f\[j ] = min(/i[y — 1] + a\j, f 2 [j — 1] + f 2 j-i + aij) 

fi\j\ = min(/ 2 [y — 1] + a 2 j, f\[j — 1] + hj-i + a 2 j) 

fiij] gives the value of an optimal solution. What if we want to construct an 
optimal solution? 

• f \ j | = line # (1 or 2) whose station j — 1 is used in fastest way through §j. 

• In other words precedes 5,,/. 

• Defined for i = 1,2 and j = 2,... ,n. 

• l* = line # whose station n is used. 



15-4 


Lecture Notes for Chapter 15: Dynamic Programming 


For example: 



l 

2 

3 

4 

5 

j 

2 

3 

4 

5 

m 

9 

18 

20 

24 

32 

hin 

1 

2 

1 

1 

fiUi 

12 

16 

22 

25 

30 

m 

1 

2 

1 

2 


/* = 35 l* = 1 

Go through optimal way given by 1 values. (Shaded path in earlier figure.) 

Compute an optimal solution 

Could just write a recursive algorithm based on above recurrences. 

• Let t'i(j) = # of references made to fi\j]. 

• n(n) = r 2 (n) = l. 

• G (j) = r 2 (j) = G (j + 1) + r 2 (j + 1) for j = 1, ..., n - 1. 

Claim 

n(j) = T~i. 

Proof Induction on j , down from n. 

Basis: j — n. 2 n ~ J = 2° = 1 = r, (n ). 

Inductive step: Assume rfj + 1) = 2"^ (j+l> . 

Then r ( (j) = r ( (;' + 1) + r 2 (j + 1) 

_ 2 ,,_ 0'+D _|_ 2 ,, -(i+ 1 ) 

_ 2 ,,_< f+ 1)+1 

= 2' 1- - 7 . ■ (claim) 

Therefore, /i[l] alone is referenced 2" -1 times! 

So top down isn’t a good way to compute f [ y 1. 

Observation: f\j] depends only on /i [ j — 1] and f 2 [j — 

So compute in order of increasing j. 


1] (for j > 2). 
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Fastest-Way (a, t,e,x, n ) 

/i[l] e\ + ai i 
/ 2 [ 1 ] ^2 + 02,1 
for j *r- 2 to /? 

do if f\[j — 1] + ai, ; - < f 2 [j — 1] + t 2 j-i + a\j 
then fi\j] <r- j\ \ j - 1] + a hj 

hlj] +- 1 

else filj] ■<— f 2 \j — 1] + t 2 j -1 + aij 

hlj] 2 

if fllj ~ 1] + «2 J < fllj — 1] + hj -1 + 02J 
then f 2 \j\ 4- / 2 [./ - 1] + a 2J 
hlj] <r- 2 

else f 2 \ j | •«— /i [./ — 1] + tij-i + 02 ,; 

/ 2 [J] ^ 1 

if fdn] +xi < f 2 \n | +y 2 
then /* = /i [a | +x, 

/* = 1 

else /* = ,/ 2 |o| +W 
/* = 2 

Go thi'ough example. 


Constructing an optimal solution 

Print-Stations (/,«) 
i /* 

print “line ” i station ” n 

for j <— n downto 2 

do i <- hlj] 

print “line ” i “, station ” j — 1 

Go through example. 

Time = @(n) 


Longest common subsequence 


Problem: Given 2 sequences, X = (jq, ..., x m ) and Y = (ji, ..., y n )- Find 
a subsequence common to both whose length is longest. A subsequence doesn’t 
have to be consecutive, but it has to be in order. 

[To come up with examples of longest common subsequences, search the dictio- 
nary for all words that contain the word you are looking for as a subsequence. On 
a UNIX system, for example, to find all the words with pine as a subsequence, 
use the command grep ' . *p. *i . *n. *e . * ' diet, where diet is your lo¬ 
cal dictionary. Then check if that word is actually a longest common subsequence. 
Working C code for finding a longest commmon subsequence of two strings ap¬ 
pears at http://www.cs.dartmouth.edU/~thc/code/lcs.c] 
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Examples: [The examples are of different types of trees.] 



pioneer snowflake 



becalm scholarly 


Brute-force algorithm: 

For every subsequence of X, check whether it’s a subsequence of Y. 

Time: &(n2 m ). 

• 2'” subsequences of X to check. 

• Each subsequence takes ©(«) time to check: scan Y for first letter, from there 
scan for second, and so on. 


Optimal substructure 

Notation: 

X, = prefix (jci, ... ,JCj) 

Yi = prefix (>i,..., >7) 

Theorem 

Let Z = (zi, ..., Zk) be any LCS of X and Y. 

1. If x m = y n , then Zk = x m = y„ and Z k _\ is an LCS of X m -\ and 7„_ 1 . 

2. If x m / y nt then Zk / x m => Z is an LCS of X m -\ and Y. 

3. If x m / y„, then z k / y„ => Z is an LCS of X and 7„_j. 

Proof 

1. Lirst show that Zk = x m = y n . Suppose not. Then make a subsequence Z' = 
(zi,..., Zk, x m ). It’s a common subsequence of X and Y and has length k + 1 
=>• Z' is a longer common subsequence than Z => contradicts Z being an LCS. 

Now show Z k -1 is an LCS of X m -\ and T„_i. Cleaidy, it’s a common subse¬ 
quence. Now suppose there exists a common subsequence W of X m -\ and Y n -\ 
that’s longer than Z k ~\ => length of W > k. Make subsequence W' by append¬ 
ing x m to W. W' is common subsequence of X and Y , has length > k + 1 => 
contradicts Z being an LCS. 

2. If Zk f x m , then Z is a common subsequence of X m -\ and Y. Suppose there 
exists a subsequence W of X m -\ and Y with length > k. Then W is a common 
subsequence of X and Y => contradicts Z being an LCS. 
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3. Symmetric to 2. ■ (theorem) 

Therefore, an LCS of two sequences contains as a prefix an LCS of prefixes of the 
sequences. 


Recursive formulation 


Define c\i, j | = length of LCS of X, and Yj. We want c\m,n\. 


c[i, j ] 


0 

c\i — L 7 — 1] + 1 
max(c[/ - 1 ,j],c[i,j - 1]) 


if i =0 or j = 0 , 
if/, j >0 and Xi = yj , 
if i, j >0 and x,- ^ yj . 


Again, we could write a recursive algorithm based on this formulation. 
Try with bozo, bat. 



• Lots of repeated subproblems. 

• Instead of recomputing, store in a table. 


Compute length of optimal solution 

LCS-Length(X, T, m, n) 

for r <— 1 to m 
do c[i, 0] <— 0 
for j 4- 0 to n 
do c[0, j | 0 

for / <— 1 to m 

do for j <r- 1 to n 
do if xi = yj 

then c[i, j] c[i — 1 , j ~ 1 ] + 1 
b\i, j ] ^ “\” 

else if c[i — 1, j] > c[i, j — 1] 
then c[i, j ] c[i - L ;'] 

b\i, J\ <r- “t” 

else c\i, j ] c[i, j - 1] 

b\i, j\ <- 


return c and b 
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Print-LCS (ft, X, i, j) 
if / = 0 or j — 0 
then return 

if b[i, j] = “\” 

then Print-LCS( ft, X, i - 1, j - 1) 
print Xj 

elseif ftp', j] = “f” 
then Print-LCS (ft, X, i - 1, j) 
else Print-LCS (ft, X, i, j - 1) 

• Initial call is Print-LCS (ft, X, m, n). 

• b\i, j ] points to table entry whose subproblem we used in solving LCS of Xj 
and Yj. 

• When b[i, j ] = \, we have extended LCS by one character. So longest com¬ 
mon subsequence = entries with \ in them. 


Demonstration: show only c[i, j ]: 


a m p u 


a t 


o n 



Time: ®(mn ) 


Optimal binary search trees 

[Also new in the second edition.] 

• Given sequence K = (k\, k. 2 , ..., k n ) of n distinct keys, sorted (k\ < ki < 

• • • < k n ). 

• Want to build a binary search tree from the keys. 

• For kj, have probability /;, that a search is for k,. 

• Want BST with minimum expected search cost. 
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• Actual cost = # of items examined. 

For key k,, cost = depth r (k, ) + 1, where depth r (k,) = depth of k, in BST T. 

E [search cost in T ] 

n 

= ^(depth r (l;) + 1) • p, 

i= 1 

n n 

= ^ depths (kj ) • Pi + ^2 Pi 

i= 1 /=1 

n 

= 1 + depth T (kj) ■ pi (since probabilities sum to 1) (*) 

i=i 

[Similar to optimal BST problem in the book , but simplified here: we assume that 
all searches are successful. Book has probabilities of searches between keys in 
tree.] 


i 

i 

2 

3 

4 

5 

pi 

.25 

.2 

.05 

.2 

.3 


Example: 



i depths (lq) depth r (k,) • p, 


1 1 .25 

2 0 0 

3 2 .1 

4 1 .2 

5 2 .6 


1.15 


Therefore, E [search cost] = 2.15. 
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k 3 

i depth T {kj) depth, (kf) ■ p t 


1 1 .25 

2 0 0 

3 3 .15 

4 2 .4 

5 1 .3 


1.10 

Therefore, E [search cost] = 2.10, which turns out to be optimal. 

Observations: 

• Optimal BST might not have smallest height. 

• Optimal BST might not have highest-probability key at root. 

Build by exhaustive checking? 

• Construct each «-node BST. 

• For each, put in keys. 

• Then compute expected search cost. 

• But there are Q (4''/n 3,,2 j different BSTs with n nodes. 

Optimal substructure 

Consider any subtree of a BST. It contains keys in a contiguous range k, ... ,kj 
for some 1 <i <j < n. 



If T is an optimal BST and T contains subtree T' with keys kj,.... kj, then T' 
must be an optimal BST for keys k, . kj. 

Proof Cut and paste. ■ 


Use optimal substructure to construct an optimal solution to the problem from op¬ 
timal solutions to subproblems: 
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• Given keys kj, ..., kj (the problem). 

• One of them, k r , where i <r < j, must be the root. 

• Left subtree of k r contains kj, ..., k,—\. 

• Right subtree of k r contains k r+ \,... ,kj. 



• If 

• we examine all candidate roots k r , for i < r < j, and 

• we determine all optimal BSTs containing lq,...,k r -1 and containing 

kr +1 I • • • < kj, 

then we’re guaranteed to find an optimal BST for kj, ..., kj. 

Recursive solution 

Subproblem domain: 

• Find optimal BST for kj,..., kj, where i >!,;'< n, j >i - 1. 

• When j — i — 1, the tree is empty. 

Define e[i, j ] = expected search cost of optimal BST for kj, ..., kj. 

If j — i — 1, then e[i, j ] = 0. 

If j > i, 

• Select a root k r , for some i < r < j. 

• Make an optimal BST with kj,..., k r -\ as the left subtree. 

• Make an optimal BST with k r+ \, ... ,kj as the right subtree. 

• Note: when r = i, left subtree is kj, ..., kj- 1 ; when r — j, right subtree is 

k i +1. kj. 

When a subtree becomes a subtree of a node: 

• Depth of every node in subtree goes up by 1. 

• Expected search cost increases by 

j 

w(i, j ) = E Pi (refer to equation (*)) . 

l—i 

If k, is the root of an optimal BST for kj, ... ,kj\ 

e[i, j ] = p r + (e[i, r - 1] + w(i, r - 1)) + (e\r + 1, ;'] + w(r + 1, j)) . 
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But w(i, j ) = w(i, r — 1) + p r + w(r + 1, j). 

Therefore, e[i, j ] = e\i, r — 1] + e[r + 1, j] + w(i, j). 

This equation assumes that we already know which key is k,■. 
We don’t. 

Try all candidates, and pick the best one: 


e[i, j ] 


0 if j — i — 1 , 

min {<?[/, r — 1] + e[r + 1, j ] + w(i, j )} if i < j . 

i<r<j 


Could write a recursive algorithm... 


Computing an optimal solution 

As “usual,” we’ll store the values in a table: 



can store can store 
e[n + 1,«] e[l,0] 


• Will use only entries e\i, j\, where j > / — 1. 

• Will also compute 

root[/, j | = root of subtree with keys kj , ..., kj, for 1 <i<j<n. 

One other table... don’t recompute w(i, j ) from scratch every time we need it. 
(Would take 0(j — i) additions.) 

Instead: 

• Table w;[l.. n + 1,0 .. n\ 

• w\i, i — 1] = 0 for 1 < i < n 

• w[i, j ] = w[i, j — 1] + pj for 1 < i < j < n 

Can compute all Q(n 2 ) values in 0(1) time each. 

OPTIMAL-BST(p, q, n) 
for f <— 1 to n + 1 
do e\i, / — 11 ^— 0 
w\i, i — 1] <— 0 
for / ^— 1 to n 

do for i <r- 1 to n — l + 1 
do j <r- i + Z — 1 
e[i, j ] oo 

w\i, j 1 w[i, j - 1] + pj 

for r <— i to j 

do t e[i, r — 1] + e\r + 1, j] + w[i, j] 
if t < e[i, j ] 
then e[i, j ] t 

root[i, j ] r 


return e and root 
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First for loop initializes e , w entries for subtrees with 0 keys. 

Main for loop: 

• Iteration for / works on subtrees with I keys. 

• Idea: compute in order of subtree sizes, smaller (1 key) to larger (n keys). 
For example at beginning: 



j 


w 

0 1 2 3 4 5 

1 

0 .25 .45 .5 .7 1.0 

2 

0 .2 .25 .45 .75 

3 

0 .05 .25 .55 

l 

4 

0 .2 .5 

5 

0 .3 

6 

0 


j 

root 

1 2 3 4 5 

1 

1112 2 

2 

2 2 2 4 

i 3 

3 4 5 

4 

4 5 

5 

5 


Time: 0(n 3 ): for loops nested 3 deep, each loop index takes on < n values. Can 
also show £2(n 3 ). Therefore, (-)(/re¬ 


construct an optimal solution 

Construct-Optimal-BST {root) 

r root\ 1, n | 

print “k” r “is the root” 

Construct-Opt-Subtree( 1, r — 1, r, “left”, root) 
CONSTRUCT-OPT-SUBTREE(r + 1, n, r, “right”, root) 

CONSTRUCT-OPT-SUBTREE(/, j, r, dir, root) 
if i < j 

then t <r- root[i, j] 

print “k” t “is” dir “child of k” r 
Construct-Opt-Subtree(/, t - 1, t, “left”, root ) 
Construct-Opt-Subtree (t + t, “right”, root) 
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Elements of dynamic programming 

Mentioned already: 

• optimal substructure 

• overlapping subproblems 


Optimal substructure 

• Show that a solution to a problem consists of making a choice, which leaves 
one or subproblems to solve. 

• Suppose that you are given this last choice that leads to an optimal solution. [We 
find that students often have trouble understanding the relationship between 
optimal substructure and determining which choice is made in an optimal so¬ 
lution. One way that helps them understand optimal substructure is to imagine 
that “God” tells you what was the last choice made in an optimal solution.] 

• Given this choice, determine which subproblems arise and how to characterize 
the resulting space of subproblems. 

• Show that the solutions to the subproblems used within the optimal solution 
must themselves be optimal. Usually use cut-and-paste: 

• Suppose that one of the subproblem solutions is not optimal. 

• Cut it out. 

• Paste in an optimal solution. 

• Get a better solution to the original problem. Contradicts optimality of prob¬ 
lem solution. 

That was optimal substructure. 

Need to ensure that you consider a wide enough range of choices and subproblems 
that you get them all. [“God” is too busy to tell you what that last choice really 
was.] Try all the choices, solve all the subproblems resulting from each choice, 
and pick the choice whose solution, along with subproblem solutions, is best. 

How to characterize the space of subproblems? 

• Keep the space as simple as possible. 

• Expand it as necessary. 

Examples: 

Assembly-line scheduling 

• Space of subproblems was fastest way from factory entry through stations 
Sij and S 2J . 

• No need to try a more general space of subproblems. 

Optimal binary search trees 

• Suppose we had tried to constrain space of subproblems to subtrees with 
keys ki,k 2 ,... ,kj. 
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• An optimal BST would have root k r , for some 1 < r < j. 

• Get subproblems k\,..., k r -1 and k r+ \,... ,kj. 

• Unless we could guarantee that r = j , so that subproblem with k -+1 . kj 

is empty, then this subproblem is not of the form k\, ky, ■ ■ ■, kj. 

• Thus, needed to allow the subproblems to vary at “both ends,” i.e., allow 
both / and j to vary. 

Optimal substructure varies across problem domains: 

1. How many subproblems are used in an optimal solution. 

2. How many choices in determining which subproblem(s) to use. 

• Assembly-line scheduling: 

• 1 subproblem 

• 2 choices (for S,j use either S ij_i or S 2 j-i) 

• Longest common subsequence: 

• 1 subproblem 

• Either 

• 1 choice (if x t = yj, LCS of X,_i and 7 ; -_j), or 

• 2 choices (if .x, yj , LCS of A,_| and Y, and LCS of X and T ; -_|) 

• Optimal binary search tree: 

• 2 subproblems (k,, ..., k r -1 and k r+ \, ... ,kj) 

• j — i + 1 choices for k r in /q,..., kj. Once we determine optimal solutions 
to subproblems, we choose from among the j — i + 1 candidates for U 

Informally, running time depends on (# of subproblems overall) x (# of choices). 

• Assembly-line scheduling: 0(n) subproblems, 2 choices for each 
=^> 0(n) running time. 

• Longest common subsequence: &(mn ) subproblems, < 2 choices for each 

&(mn) running time. 

• Optimal binary search tree: 0(n 2 ) subproblems, O(n) choices for each 
=> 0(n 3 ) running time. 

Dynamic programming uses optimal substructure bottom up. 

• First find optimal solutions to subproblems. 

• Then choose which to use in optimal solution to the problem. 

When we look at greedy algorithms, we’ll see that they work top down : first make 
a choice that looks best, then solve the resulting subproblem. 

Don’t be fooled into thinking optimal substructure applies to all optimization prob¬ 
lems. It doesn’t. 

Here are two problems that look similar. In both, we’re given an unweighted, 
directed graph G = (V, E). 
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• V is a set of vertices. 

• E is a set of edges. 

And we ask about finding a path (sequence of connected edges) from vertex u to 
vertex v. 

• Shortest path: find path u ^ v with fewest edges. Must be simple (no cycles), 
since removing a cycle from a path gives a path with fewer edges. 

• Longest simple path : find simple path u v with most edges. If didn’t require 
simple, could repeatedly traverse a cycle to make an arbitrarily long path. 

Shortest path has optimal substructure. 


Pi P 2 



P 


• Suppose p is shortest path u v. 

• Let w be any vertex on p. 

• Let pi be the portion of p, u ^ w. 

• Then p\ is a shortest path u ^ w. 

Proof Suppose there exists a shorter path p \, u w. Cut out p \, replace it 
with p'j, get path u -Q w Q v with fewer edges than p. ■ 

Therefore, can find shortest path u v by considering all intermediate vertices w, 
then finding shortest paths u ^ w and w v. 

Same argument applies to pi. 

Does longest path have optimal substructure? 

• It seems like it should. 

• It does not. 



Consider q —> r —> t = longest path q ^ t. Are its subpaths longest paths? 
No! 

• Subpath q r is q — > r. 

• Longest simple path q r is q —> s —> t —> r . 

• Subpath r t is r — > t. 

• Longest simple path r t is r — q —>• s — t. 
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Not only isn’t there optimal substructure, but we can’t even assemble a legal solu¬ 
tion from solutions to subproblems. 

Combine longest simple paths: 

Not simple! 

In fact, this problem is NP-complete (so it probably has no optimal substructure to 
find.) 

What’s the big difference between shortest path and longest path? 

• Shortest path has independent subproblems. 

• Solution to one subproblem does not affect solution to another subproblem of 
the same problem. 

• Longest simple path: subproblems are not independent. 

• Consider subproblems of longest simple paths q^r and r t. 

• Longest simple path q r uses s and t. 

• Cannot use s and t to solve longest simple path r t, since if we do, the path 
isn’t simple. 

• But we have to use t to find longest simple path r t\ 

• Using resources (vertices) to solve one subproblem renders them unavailable to 
solve the other subproblem. 

[For shortest paths, if we look at a shortest path u w v, no vertex other 
than w can appear in p\ and p 2 . Otherwise, we have a cycle.] 

Independent subproblems in our examples: 

• Assembly line and longest common subsequence 

• 1 subproblem =>• automatically independent. 

• Optimal binary search tree 

• ki, ..., k r -1 and k r+ \,... ,kj =>• independent. 


Overlapping subproblems 


These occur when a recursive algorithm revisits the same problem over and over. 

Good divide-and-conquer algorithms usually generate a brand new problem at each 
stage of recursion. 

Example: merge sort 
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Won’t go through exercise of showing repeated subproblems. 

Book has a good example for matrix-chain multiplication. 

Alternative approach: memoization 

• “Store, don’t recompute.” 

• Make a table indexed by subproblem. 

• When solving a subproblem: 

• Lookup in table. 

• If answer is there, use it. 

• Else, compute answer, then store it. 

• In dynamic programming, we go one step further. We determine in what order 
we’d want to access the table, and fill it in that way. 



Solutions for Chapter 15 
Dynamic Programming 


Solution to Exercise 15.1-5 

If lilj] = 2, then the fastest way to go through station j on line 1 is by changing 
lines from station j — 1 on line 2. This means that f 2 [j — 1] + h.j-i + fli j < 
fiij — 1] + a\j. Dropping a\j from both sides of the equation yields f 2 \j — 1] + 
hj-i < /i I j ~ !]• 

If l 2 [j] = 1, then the fastest way to go through station j on line 2 is by changing 

lines from station j — 1 on line 1. This means that f \ j — 1] + t 2 j-\ + ci 2 j < 

f 2 [j ~ 1] + 02 j. Dropping a 2 j from both sides of the equation yields f \ j — 1] + 
tij-i < f 2 \j — 1]. 

We can derive a contradiction by combining the two equations as follows: 
fiij - 1] + hj -1 < fdj - 1] and f\\j - 1] + t 2 j -1 < fi\j - 1] yields 

f 2 [j — 1] + hj ~i + t 2 j—\ < f 2 [j — 1]. Since all transfer costs are nonnega¬ 

tive, the resulting inequality cannot hold. We conclude that we cannot have the 
situation where l\\j\ = 2 and l 2 [j\ = 1. 


Solution to Exercise 15.2-4 


Each time the /-loop executes, the /-loop executes n — 1 + 1 times. Each time the 
/-loop executes, the /:-loop executes j — i = / — 1 times, each time referencing 
m twice. Thus the total number of times that an entry of rn is referenced while 
computing other entries is YH= 2 ^ n — / + 1)(/ — 1)2. Thus, 

n n n 

^ = J>-/ + 1)(/-1)2 

i= 1 j=i 1=2 

n— 1 

= 2 J2( n - o/ 

i=i 


2 E ”'- 2 E ' 2 


^ n(n — 1 )n ^ (n — 1 )n(2n — 1 ) 
2 6 
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, , 2 n 3 — 3 n 2 + n 

= n — n - 

3 

n 3 — n 

3 ' 

Solution to Exercise 15.3-1 

Running Recursive-Matrix-Chain is asymptotically more efficient than enu¬ 
merating all the ways of parenthesizing the product and computing the number of 
multiplications for each. 

Consider the treatment of subproblems by the two approaches. 

• For each possible place to split the matrix chain, the enumeration approach 
finds all ways to parenthesize the left half, finds all ways to parenthesize the 
right half, and looks at all possible combinations of the left half with the right 
half. The amount of work to look at each combination of left- and right-half 
subproblem results is thus the product of the number of ways to do the left half 
and the number of ways to do the right half. 

• For each possible place to split the matrix chain, Recursive-Matrix-Chain 
finds the best way to parenthesize the left half, finds the best way to parenthesize 
the right half, and combines just those two results. Thus the amount of work to 
combine the left- and right-half subproblem results is 0(1). 

Section 15.2 argued that the running time for enumeration is Q (4 1 /n 3 ^ 2 ). We will 
show that the running time for Recursive-Matrix-Chain is 0(n3" _1 ). 

To get an upper bound on the running time of Recursive-Matrix-Chain, we’ll 
use the same approach used in Section 15.2 to get a lower bound: Derive a recur¬ 
rence of the form T (n) < ... and solve it by substitution. For the lower-bound 
recurrence, the book assumed that the execution of lines 1-2 and 6-7 each take at 
least unit time. For the upper-bound recurrence, we’ll assume those pairs of lines 
each take at most constant time c. Thus, we have the recurrence 

c if n — 1 , 

72—1 

c + ( k) + T (n — k) + c) if n >2 . 

k= 1 

This is just like the book’s > recurrence except that it has c instead of 1, and so we 
can be rewrite it as 

72—1 

T{n ) < 2^T(/) + cn . 

i=i 

We shall prove that T (n) = 0(«3" -1 ) using the substitution method. (Note: Any 
upper bound on T (n ) that is o(4"/n 3 / 2 ) will suffice. You might prefer to prove one 
that is easier to think up, such as T(n) = 0( 3.5").) Specifically, we shall show 
that Tin) < cn3"~ l for all n > 1. The basis is easy, since T(l) < c — c ■ 1 • 3 1_1 . 


Tin) < 
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Inductively, for n > 2 we have 


n— 1 


T(n ) < 2^r(/) + c 
1 = 1 
n — 1 

< 2^c/3'“ 1 + 


CH 


1=1 


n— 1 


< C 


2^/3 f_1 + ‘ 


i=l 


, >3"-' 1 - 3" 

c • I 2 • | --r + 


i (3 -1 y 


,1-3" 

cn3 1 + c • (-h n 


+ n 


(see below) 


= cn 3" 1 
< c«3" _l for all c > 0, n > 1 


+ - (2n + 1 - 3") 


Running Recursive-Matrix-Chain takes 0(n3" _1 ) time, and enumerating all 
parenthesizations takes £2(4 "/h 3 / 2 ) time, and so Recursive-Matrix-Chain is 
more efficient than enumeration. 

Note: The above substitution uses the fact that 

it — 1 n — 1 i n 

i nx 1 — X 

tt" = — + T^W 

This equation can be derived from equation (A.5) by taking the derivative. Let 

x n - 1 
x‘ — - 

1=1 

Then 

X) ix‘~ l = f{x) 


n— 1 


fix) = Y^x 1 


1 


1. 


nx 


n— 1 


+ 


1 


1=1 


1 (x - l) 2 ' 


Solution to Exercise 15.4-4 

When computing a particular row of the c table, no rows before the previous row 
are needed. Thus only two rows—2 • length [T] entries—need to be kept in memory 
at a time. (Note: Each row of c actually has length[Y] + 1 entries, but we don’t 
need to store the column of 0’s—instead we can make the program “know” that 
those entries are 0.) With this idea, we need only 2 • mintm. n) entries if we always 
call LCS -Length with the shorter sequence as the Y argument. 

We can thus do away with the c table as follows: 

• Use two arrays of length minim, n), previous-row and current-row , to hold the 
appropriate rows of c. 

• Initialize previous-row to all 0 and compute current-row from left to right. 
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• When current-row is filled, if there are still more rows to compute, copy 
current-row into previous-row and compute the new current-row. 

Actually only a little more than one row’s worth of c entries—min (m, n) + 1 en¬ 
tries—are needed during the computation. The only entries needed in the table 
when it is time to compute c\i, j | are c\i, k ] for k < j — 1 (i.e., earlier entries in 
the current row, which will be needed to compute the next row); and c\i — 1, k] for 
k > j — 1 (i.e., entries in the previous row that are still needed to compute the rest 
of the current row). This is one entry for each k from 1 to minim. n) except that 
there are two entries with k = j — 1, hence the additional entry needed besides the 
one row’s worth of entries. 

We can thus do away with the c table as follows: 

• Use an array a of length min (m, n ) + 1 to hold the appropriate entries of c. At 
the time c\i, j ] is to be computed, a will hold the following entries: 

• a[k ] = c\i, k ] for 1 < k < j — I (i.e., earlier entries in the current “row”), 

• a[k ] = c\i — 1, k] for k > j — 1 (i.e., entries in the previous “row”), 

• «[0] = c[i, j — 1] (i.e., the previous entry computed, which couldn’t be put 
into the “right” place in a without erasing the still-needed c\i — 1, j — 1]). 

• Initialize a to all 0 and compute the entries from left to right. 

• Note that the 3 values needed to compute c\i, j ] for j > I are in a [01 = 
c[i, j - 1], a\ j - 1] = c\i - 1, j - 1], and a[j ] = c\i - 1, j\. 

• When c[i, j ] has been computed, move n[0] (c\i, j — 1]) to its “correct” 
place, a[j — 1], and put c\i, j ] in n[0]. 


Solution to Problem 15-1 

Taking the book’s hint, we sort the points by v-coordinate, left to right, in Oin lg n) 
time. Let the sorted points be, left to right, (pi, p 2 , pi, ..., p„)■ Therefore, p t is 
the leftmost point, and p n is the rightmost. 

We define as our subproblems paths of the following form, which we call bitonic 
paths. A bitonic path f) / , where i < j, includes all points p\. p 2 , ..., pf, it 
starts at some point /;,, goes strictly left to point p\ , and then goes strictly right to 
point pj. By “going strictly left,” we mean that each point in the path has a lower v- 
coordinate than the previous point. Looked at another way, the indices of the sorted 
points form a strictly decreasing sequence. Likewise, “going strictly right” means 
that the indices of the sorted points form a strictly increasing sequence. Moreover, 
Pj j contains all the points p\, p 2 , py,..., pj. Note that pj is the rightmost point 
in P, j and is on the rightgoing subpath. The leftgoing subpath may be degenerate, 
consisting of just p\. 

Let us denote the euclidean distance between any two points p and pj by | PiPj\. 
And let us denote by b\i, j ], for 1 <i < j < n, the length of the shortest bitonic 
path Pi j. Since the leftgoing subpath may be degenerate, we can easily compute 
all values b\ 1, j]. The only value of b\i, /] that we will need is b[n, n], which is 
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the length of the shortest bitonic tour. We have the following formulation of b\i, j | 
for 1 </<_/'< n: 

b[ 1,2] = \pip 2 \ , 

b[i, j ] = b\i, j - 1] + \Pj-\Pj\ for i < j - 1 , 

= min {b[k, j - 1] + \p k pj |} . 

1<*< y — 1 

Why are these formulas correct? Any bitonic path ending at /y has p 2 as its right¬ 
most point, so it consists only of p\ and p 2 . Its length, therefore, is \p t p 2 \. 

Now consider a shortest bitonic path Pj j. The point pj-\ is somewhere on this 
path. If it is on the rightgoing subpath, then it immediately preceeds pj on this 
subpath. Otherwise, it is on the leftgoing subpath, and it must be the rightmost 
point on this subpath, so i — j — 1. In the first case, the subpath from p to p/-i 
must be a shortest bitonic path Pjj-i, for otherwise we could use a cut-and-paste 
argument to come up with a shorter bitonic path than P L J . (This is paid of our opti¬ 
mal substructure.) The length of pj, therefore, is given by b\i. j — 1] + \pj_\pj\. 
In the second case, pj has an immediate predecessor p k , where k < j — 1, on 
the rightgoing subpath. Optimal substructure again applies: the subpath from p- 
to pj -1 must be a shortest bitonic path P k .j- i, for otherwise we could use cut-and- 
paste to come up with a shorter bitonic path than p ; . (We have implicitly relied 
on paths having the same length regardless of which direction we traverse them.) 
The length of Pjj, therefore, is given by mini< J t<j_i {b[k. j — 1] + \p k pj\}. 

We need to compute b\n, n \. In an optimal bitonic tour, one of the points adjacent 
to p n must be p n -\, and so we have 

b[n, n ] = b[n - 1, n] + \p n -ip„\ . 

To reconstruct the points on the shortest bitonic tour, we define r[i, j ] to be the 
immediate predecessor of pj on the shortest bitonic path pj. The pseudocode 
below shows how we compute b[i, j ] and r[i, j |: 

Euclidean-TSP(p) 

sort the points so that {p\. p 2 , p 2 ,, p„) are in order of increasing v-coordinate 
b[ 1, 2] \pip 2 \ 

for j <- 3 to n 

do for i <r- 1 to j — 2 

do b\i, j ] b\i, j - 1] + \pj-iPj\ 

rU, J\ <- j ~ 1 
b[j - 1,;'] oo 
for k 1 to j —2 

do q <r- b[k, j - 1] + | p k pj\ 
if q < b\j - 1, ;'] 

then b[j -l, j] q 

r[j ~ 1, ;'] k 

b[n, n ] +-b\n - 1, n\ + \p n _ l p n \ 

return b and r 

We print out the tour we found by starting at p„, then a leftgoing subpath that 
includes p„-\, from right to left, until we hit p\. Then we print right-to-left the 
remaining subpath, which does not include p n - 1 . For the example in Figure 15.9(b) 
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on page 365, we wish to print the sequence pj, p^, Pa- Pi, Pi -Pi, Pi- Our code is 
recursive. The right-to-left subpath is printed as we go deeper into the recursion, 
and the left-to-right subpath is printed as we back out. 

Print-Tour (r, n) 
print p n 
print p n _ i 
k <- r[n — 1,«] 

Print-PATH( r, k, n - 1) 
print p k 

Print-Path (r, i, j) 
if i < j 

then k <r- r[i, j ] 
print p k 
if k > 1 

then Print-Path(t, i, k) 
else k r\ j, i | 
if k > 1 

then Print-Path(t, k, j) 
print p k 

The relative values of the parameters i and j in each call of Print-Path indicate 
which subpath we’re working on. If i < j, we’re on the right-to-left subpath, and 
if i > j , we’re on the left-to-right subpath. 

The time to run Euclidean-TSP is 0(n 2 ) since the outer loop on j iterates n — 2 
times and the inner loops on i and k each run at most n — 2 times. The sorting step 
at the beginning takes 0(n Ig n ) time, which the loop times dominate. The time to 
run Print-Tour is 0(n), since each point is printed just once. 


Solution to Problem 15-2 


Note: we will assume that no word is longer than will fit into a line, i.e., \ < M 
for all i. 


First, we’ll make some definitions so that we can state the problem more uniformly. 
Special cases about the last line and worries about whether a sequence of words fits 
in a line will be handled in these definitions, so that we can forget about them when 
framing our overall strategy. 

• Define extras[i, j ] = M — / +/' — 4 to he the number of extra spaces at the 

end of a line containing words i through j. Note that extras may be negative. 

• Now define the cost of including a line containing words i through j in the sum 
we want to minimize: 


lc[i, j ] 


oo if extras[i, j] < 0 (i.e., words don’t fit) , 

0 if j = n and extras\i, j ] > 0 (last line costs 0) , 

(extras\i, j |) 3 otherwise . 




Solutions for Chapter 15: Dynamic Programming 


15-25 


By making the line cost infinite when the words don’t fit on it, we prevent such 
an arrangement from being paid of a minimal sum, and by making the cost 0 for 
the last line (if the words fit), we prevent the arrangement of the last line from 
influencing the sum being minimized. 


We want to minimize the sum of Ic over all lines of the paragraph. 

Our subproblems are how to optimally arrange words I....,/, where j = 

1,...,«. 

Consider an optimal arrangement of words 1Suppose we know that the 
last line, which ends in word j, begins with word i. The preceding lines, therefore, 
contain words 1— 1. In fact, they must contain an optimal arrangement of 
words 1, — 1. (Insert your favorite cut-and-paste argument here.) 

Let c[j] be the cost of an optimal arrangement of words 1, ..., j. If we know that 

the last line contains words i . j, then c[j ] = c[i — 1] + lc[i, j]. As a base case, 

when we’re computing c[l], we need c[0]. If we set c[0] = 0, then c[l] = lc\ 1, 1], 
which is what we want. 


But of course we have to figure out which word begins the last line for the sub¬ 
problem of words 1. j. So we try all possibilities for word i, and we pick the 

one that gives the lowest cost. Here, i ranges from 1 to j. Thus, we can define c | j | 
recursively by 


c\j\ = 


0 

min (c\i — 1] + lc[i, j ]) 

1 <i<j 


if J = 0 , 

if./ >0. 


Note that the way we defined Ic ensures that 


• all choices made will fit on the line (since an arrangement with Ic = oo cannot 
be chosen as the minimum), and 

• the cost of putting words i,..., j on the last line will not be 0 unless this really 
is the last line of the paragraph ( j — n) or words i... j fill the entire line. 


We can compute a table of c values from left to right, since each value depends 
only on earlier values. 

To keep track of what words go on what lines, we can keep a parallel p table that 
points to where each c value came from. When c[j] is computed, if c\j | is based 
on the value of c[k — 1], set p\j | = k. Then after c\n\ is computed, we can trace 
the pointers to see where to break the lines. The last line starts at word p\n\ and 
goes through word n. The previous line starts at word p[p[n]] and goes through 
word p\n\ — 1, etc. 

In pseudocode, here’s how we construct the tables: 
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Print-Neatly(7, n, M) 

> Compute extras[i, j] for 1 <i <j < n. 

for f ^— 1 to n 

do extras[i, i\ ■*— M — /,• 

for j i + 1 to n 

do extrcis[i, j ] <- extras[i, j — 1] — lj — 1 

> Compute lc[i, j ] for 1 <i <j < n. 
for / ^— ] to n 

do for j i to n 

do if extras[i, j] < 0 
then lc[i, j] o o 

elseif j = n and extras[i, j | > 0 
then lc[i, j \ <— 0 
else lc\i, j ] (extras [i, j \ f 

> Compute c[j] and p[j] for 1 < j < n. 
c[0] «- 0 

for / I to n 
do c[j] <r- oo 
for / ^— 1 to j 

do if c\i — 1] + lc[i, j] < c\j\ 

then c[j] c\i — 1] + lc[i, j ] 

P\j\ i 

return c and p 

Quite clearly, both the time and space are 0(n 2 ). 

In fact, we can do a bit better: we can get both the time and space down to 0(« M). 
The key observation is that at most \M/ 2] words can fit on a line. (Each word is 
at least one character long, and there’s a space between words.) Since a line with 

words i . j contains j — i + 1 words, if j — i + 1 > \M/2~\ then we know 

that lc\i, j ] = oo. We need only compute and store extras[i, j ] and lc\i, j] for 
j — / + 1 < [M/2]. And the inner for loop header in the computation of c[j] 
and p[j] can run from max(l, j — [M/2] + 1) to j. 

We can reduce the space even further to 0(n). We do so by not storing the Ic 
and extras tables, and instead computing the value of lc\i, j ] as needed in the last 
loop. The idea is that we could compute lc\i, j ] in 0(1) time if we knew the 
value of extras[i, j ]. And if we scan for the minimum value in descending order 
of i, we can compute that as extras[i, j ] = extras[i + I— 1. (Initially, 
extras[j, j] — M — lj.) This improvement reduces the space to ©(«), since now 
the only tables we store are c and p. 

Here’s how we print which words are on which line. The printed output of 
Give-Lines (p , j) is a sequence of triples (k, i, j ), indicating that words /,..., / 
are printed on line k. The return value is the line number k. 
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Give-Lines (p, j ) 

i Pij 1 
if / = 1 

then k 1 

else k <r- Give-Lines (p, i - 1) + 1 
print (k,i, j) 

return k 

The initial call is GlVE-LlNES(p, n). Since the value of j decreases in each recur¬ 
sive call, Give-Lines takes a total of O(n) time. 


Solution to Problem 15-3 

a. Dynamic programming is the ticket. This problem is slightly similar to the 
longest-common-subsequence problem. In fact, we’ll define the notational con¬ 
veniences Xj and Yj in the similar manner as we did for the LCS problem: 
Xi = x[l .. i ] and Yj = y[l .. j ]. 

Our subproblems will be determining an optimal sequence of operations that 
converts Xj to Yj, for 0 < i < m and 0 < j < n. We’ll call this the “Xj —> Yj 
problem.” The original problem is the X m — Y„ problem. 

Let’s suppose for the moment that we know what was the last operation used to 
convert Xj to Yj. There are six possibilities. We denote by c[i, j ] the cost of an 
optimal solution to the Xj —»■ Yj problem. 

• If the last operation was a copy, then we must have had jt[i] = y [_/']• The sub¬ 
problem that remains is converting X,-_ i to T,-_ j. And an optimal solution to 
the Xj —> Yj problem must include an optimal solution to the A-i —> Yj-\ 
problem. The cut-and-paste argument applies. Thus, assuming that the last 
operation was a copy, we have c[i, j ] = c\i — 1, j — 1] + cost(copy). 

• If it was a replace, then we must have had jt[i] ^ y[ jj. (Here, we assume that 
we cannot replace a character with itself. It is a straightforward modification 
if we allow replacement of a character with itself.) We have the same optimal 
substructure argument as for copy, and assuming that the last operation was 
a replace, we have c\i, j ] = c\i — 1, j — 1] + cost (replace). 

• If it was a twiddle, then we must have had x\i\ — y[j — 1] and x\i — 1] = 
y[j], along with the implicit assumption that i, j > 2. Now our subproblem 

is Xj —2 —> Yj _ 2 and, assuming that the last operation was a twiddle, we have 

c\i, j ] = c\i — 2, j — 2] + cost (twiddle). 

• If it was a delete, then we have no restrictions on x or y. Since we can view 
delete as removing a character from Xj and leaving Yj alone, our subproblem 
is X,_| —> Yj. Assuming that the last operation was a delete, we have 
c\i, j ] = c[i — 1, j] + cost(delete). 

• If it was an insert, then we have no restrictions on x or y. Our subproblem 
is Xj —> Tj_i. Assuming that the last operation was an insert, we have 
c\i, j ] = c\i, j — I | + cost(insert). 
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• If it was a kill, then we had to have completed converting X m to T„, so that 
the current problem must be the X m —> Y n problem. In other words, we must 
have i = m and j — n. If we think of a kill as a multiple delete, we can get 
any X, —> Y n , where 0 < i < m, as a subproblem. We pick the best one, 
and so assuming that the last operation was a kill, we have 

c[m, n\ = min {c[i, «]} + cost(kill) . 

0 <i<m 


We have not handled the base cases, in which i = 0 or j = 0. These are 
easy. X 0 and T 0 are the empty strings. We convert an empty string into Yj by 
a sequence of j inserts, so that c[0, j | = j ■ cost(insert). Similarly, we convert 
Xj into To by a sequence of i deletes, so that c\i, 0] = i ■ cost(delete). When 
i — j — 0, either formula gives us c[0, 0] = 0, which makes sense, since 
there’s no cost to convert the empty string to the empty string. 

For i, j > 0, our recursive formulation for c[i, j] applies the above formulas in 
the situations in which they hold: 


c\i, j ] = min 


c\i — 1,7 — 1] + cost(copy) 

if x[i] = y[j] , 

c\i — 1,7 — 1] + cost(replace) 

if x[i\ / y[71 , 

c\i — 2, 7 — 2] + cost (twiddle) 

if i, j > 2, 
x[i] - y[j - 1], 
and x\i - 1] = y[j] 

c\i — 1, 71 + cost(delete) 

always , 

c[i, 7] = c[i, j — 1] + cost (insert) 

always , 

min {c\i, «]} + cost(kill) 

0 <i<m 

if i = m and j = n . 


Like we did for LCS, our pseudocode fills in the table in row-major order, i.e., 
row-by-row from top to bottom, and left to right within each row. Column- 
major order (column-by-column from left to right, and top to bottom within 
each column) would also work. Along with the c\i, j ] table, we fill in the table 
op[i , /'I, holding which operation was used. 
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Edit-Distance (x, y,m, n ) 

for i <— 0 to m 

do c[i, 01 i ■ cost (delete) 
op[i, 0] 4- DELETE 
for / <— 0 to n 

do c[0, j] <— j • cost(insert) 
op[ 0, 7] <r- INSERT 
for i <— 1 to m 

do for 7 1 to n 

do c[i, j ] 4- oo 
if x\i\ = y\j\ 

then c[i, j ] c[i — 1,7 — 1] + cost(copy) 
op[i, 7] COPY 

if x[i] / y[j] and c\i — 1,7 — 1]+ cost(replace) < c\i, j ] 
then c[i, j] <r- c[i — 1, j — 1] + cost (replace ) 
op[i, j ] REPLACE(by y[7]) 

if i >2 and j > 2 and x[; | = )>[./ — 1] and 
x\i - 1] = y[j] and 

c\i — 2, 7 — 2] + cost(twiddle) < c[i, j] 
then c[i, 7] c[i —2,7 — 2] + cost(twiddle) 
op[i, j ] TWIDDLE 

if c[; — 1,7] + cost(delete) < c[i, j ] 
then c(/, 7] c[/ — 1,7] + cost(delete) 

op\i, 7] •«— DELETE 
if c[;, 7 — 1] + cost(insert) < c\i, j ] 
then c[/, 7] c[7, j — 1] + cost(insert) 

op[i, 7] «- INSERT^[7]) 

for i <— 0 to m — 1 

do if c\i, n \ + cost(kill) < c[m, «] 

then c\m, n\ <— c\i, n\ + cost(kill) 
op[m, n | «— KILL / 

return c and op 

The time and space are both @(mn). If we store a KILL operation in op[m, «], 
we also include the index i after which we killed, to help us reconstruct the 
optimal sequence of operations. (We don’t need to store y [;' | in the op table for 
replace or insert operations.) 

To reconstruct this sequence, we use the op table returned by Edit-Distance. 
The procedure OP-SI- QULNCE( op, i, j ) reconstructs the optimal operation se¬ 
quence that we found to transform X, into Yj. The base case is when i — j — 0. 
The first call is OP-SLQtjENCL( op. m, n). 
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Op-Sequence (op, i, j ) 
if z =0 and j — 0 

then return 

if op[i, j] = COPY or op[i, j] = REPLACE 
then i' <— i — 1 
/ *-7 — 1 

elseif op\i, j ] = twiddle 
then i' i — 2 
j' 7 - 2 

elseif op[i, /1 = delete 
then /' 4- / — 1 
/ *~ 7 

elseif op[i, j 1 = INSERT > Don’t care yet what character is inserted, 
then / ' <— i 

/ *-7-1 

else D> Must be KILL, and must have i = m and j — n. 

let op[i, j] = kill/c 
i' <— k 
j' *- 7 

Op-Sequence (op, /) 

print op [/, j\ 

This procedure determines which subproblem we used, recurses on it, and then 
prints its own last operation. 

b. The DNA-alignment problem is just the edit-distance problem, with 
cost(copy) = — 1 , 

cost(replace) = +1 , 

cost(delete) = +2 , 

cost(insert) = +2 , 

and the twiddle and kill operations are not permitted. 

The score that we are trying to maximize in the DNA-alignment problem is 
precisely the negative of the cost we are trying to minimize in the edit-distance 
problem. The negative cost of copy is not an impediment, since we can only 
apply the copy operation when the characters are equal. 


Solution to Problem 15-6 

Denote each square by the pair (/, j), where i is the row number, j is the column 
number, and 1 < i, j < n. Our goal is to find a most profitable way from any 
square in row 1 to any square in row n. Once we do so, we can look up all the most 
profitable ways to get to any square in row n and pick the best one. 

A subproblem is the most profitable way to get from some square in row 1 to 
a particular square (/, j). We have optimal substructure as follows. Consider a 
subproblem for (/, j), where i > 1, and consider the most profitable way to (/, j). 
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Because of how we define legal moves, it must be through square (?' — 1, /), where 
j' — j — 1, j, or j + 1. Then the way that we got to (i — 1, /) within the most 
profitable way to (i, j ) must itself be a most profitable way to (/ — 1, /). The usual 
cut-and-paste argument applies. Suppose that in our most profitable way to (/, j), 
which goes through (i — 1, /), we earn a profit of d dollars to get to (i — 1, /), 
and then earn p((i — 1, /), (/, /)) dollars getting from (/ — 1, /) to (/, /); thus, 
we earn d + /;((/' — 1, /), (/, /)) dollars getting to (/, j ). Now suppose that there’s 
a way to (i — 1, /) that earns d' dollars, where d' > d. Then we would use that 
to get to (i — 1, /) on our way to (/, j), earning d! + p((i — 1, /), (/, y)) > 
d + p((i — 1, /'), (/, /)), and thus contradicting the optimality of our way to (/, j). 

We also have overlapping subproblems. We need the most profitable way to (/, j) 
to find the most profitable way to (/ + 1, j — 1), to (i + 1, j), and to (i + 1, j + 1). 
So we’ll need to directly refer to the most profitable way to (i, j) up to three times, 
and if we were to implement this algorithm recursively, we’d be solving each sub¬ 
problem many times. 

Let d[i, j ] be the profit we earn in the most profitable way to (/, j). Then we have 
that d[ 1, j] = 0 for all j — 1,2,, n. For i = 2,3,..., n, we have 


d[i, j] = max 


d\i -1,7-1]+ p((i -1,7- 1), O', 7)) 

d[i - L j] + p((i - 1,7), O', j )) 

d[i -1,7 + 1] + P((i -1,7 + 1), 0, 7)) 


if 7 > 1 , 
always , 
if j < n . 


To keep track of how we got to (i, j ) most profitably, we let w\i, j] be the value 
of j used to achieve the maximum value of d\i, j\. These values are defined for 
2 < / < n and 1 < j < n. 

Thus, we can run the following procedure: 


Checkerboard!)?, p) 

for j <r- 1 to n 
do d[ 1 , j] <— 0 
for i <— 2 to n 

do for j <r- 1 to n 

do d[i, j ] < - oo 

if 7 > 1 

then d[i, j ] d[i - 1 , j - 1 ] + p{{i - 1 , j - 1 ), (i, j)) 

Mi, j] j — 1 

if d\i - 1 . 7 '] + P((i - 1, j), 0, 7')) > d[i, 7 ] 
then d[i, 7 ] d\i - 1 , j] + p{(i - 1 , j), (i, j)) 

Mi, 7'] 7 

if j < n and d\i - 1,7 + 1 ] + p((i - 1,7 + 1 ), 0, j)) > d[i, 7 ] 

then d[i, j] <r- d[i — 1,7 + 1] + p((i -1,7 + 1), 0 , j)) 

Mi, 7 ] ^—7 + 1 

return cl and w 


Once we fill in the d\i. j ] table, the profit earned by the most profitable way to any 
square along the top row is maxi< ; <„ {d\n. /1}. 

To actually compute the set of moves, we use the usual recursive backtracking 
method. This procedure prints the squares visited, from row 1 to row n : 
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Print-Moves (w, i, j ) 
if i > 1 

then Print-Moves (w, i — 1, w[i, j ]) 
print “(” i j “)” 

Letting t = maxi<j<„ {d[n, j]}, the initial call is Print-Moves (id, n, t). 

The time to run Checkerboard is clearly 0(« 2 ). Once we have computed the 
d and w tables, Print-Moves runs in (-)(/; ) time, which we can see by observing 
that i = n in the initial call and i decreases by 1 in each recursive call. 
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Chapter 16 Introduction 

Similar to dynamic programming. 

Used for optimization problems. 

Idea: When we have a choice to make, make the one that looks best right now. 
Make a locally optimal choice in hope of getting a globally optimal solution. 

Greedy algorithms don’t always yield an optimal solution. But sometimes they 
do. We’ll see a problem for which they do. Then we’ll look at some general 
characteristics of when greedy algorithms give optimal solutions. 

[We do not cover Huffman codes or matroids in these notes.] 


Activity selection 

n activities require exclusive use of a common resource. For example, scheduling 
the use of a classroom. 

Set of activities S = {a\, ..., a n }. 

ai needs resource during period | s, , //), which is a half-open interval, where y = 
start time and f, = finish time. 

Goal: Select the largest possible set of nonoverlapping ( mutually compatible ) ac¬ 
tivities. 

Note: Could have many other objectives: 

• Schedule room for longest time. 

• Maximize income rental fees. 

Example: S sorted by finish time: [Leave on board] 


i 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Si 

1 

2 

4 

1 

5 

8 

9 

11 

13 

fi 

3 

5 

7 

8 

9 

10 

11 

14 

16 
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Maximum-size mutually compatible set: {ci\, ay, a^, as). 

Not unique: also {ay, « 5 , a-j, < 79 }. 

Optimal substructure of activity selection 

Sij = {a k e S : fi < s k < f k < sj] [Leave on board] 

= activities that start after a, finishes and finish before aj starts . 

fi s k fk s j 

-1 I-1 I- 

a i a k cij 

Activities in .S', ; are compatible with 

• all activities that finish by fi , and 

• all activities that start no earlier than sj. 

To represent the entire problem, add fictitious activities: 

«o = [— 00 , 0) 
a n+ \ = [ 00 , “00 + 1”) 

We don’t care about —00 in oq or “00 + 1” in a n+ 1 . 

Then S = So.h+i- 

Range for S,j is 0 < i, j < n + I. 

Assume that activities are sorted by monotonically increasing finish time: 
fo < f\ < fi <■■■</« < fn +1 ■ 

Then i > j =>• Sjj — 0. [Leave on board] 

• If there exists a k e Sjj: 

ft < s k < fk < Sj < fj => fj < fj . 

• But i > j =>• fj > fj. Contradiction. 

So only need to worry about Sjj with 0 < i < j < n + 1. 

All other Sjj are 0. 

Suppose that a solution to Sjj includes a k . Have 2 subproblems: 

• Sj k (start after finishes, finish before a k starts) 

• s kj (start after a k finishes, finish before aj staits) 
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Solution to Sij is (solution to S ik ) U {a k } U (solution to S k j). 

Since a k is in neither subproblem, and the subproblems are disjoint, 

| solution to Sj = | solution to 5,-*| + 1 + | solution to Skj\ • 

If an optimal solution to Sjj includes a k , then the solutions to Sj k and Skj used 
within this solution must be optimal as well. Use the usual cut-and-paste argument. 

Let Ajj = optimal solution to Sjj. 

So Ajj = A ik U {a k ) U A kj [leave on board] , assuming: 

• Sjj is nonempty, and 

• we know ci k . 


Recursive solution to activity selection 


c\i, j | = size of maximum-size subset of mutually compatible activities in Sj ■ 
* i > j Sjj = 0=f c[i, j ] = 0. 

If Sjj fk 0, suppose we know that a k is in the subset. Then 
c[i, j] = c[i, k ] + 1 + c[k, j ] . 

But of course we don’t know which k to use, and so 


c[i, j ] 


0 

max {c[/, k] + c[k, j ] + 1} 

i<k< j 
a k€Sij 


if Sjj = 0 , 
if Sjj £ 0 . 


[The first two printings of the book omit the requkement that p- e Sjj from this 
max computation. This error was corrected in the third printing.] 

Why this range of kl Because Sjj = {ct k e S : f\ < s k < f k < sj} => a k can’t be a, 
or ci j. Also need to ensure that a k is actually in S, j , since i < k < j is not sufficient 
on its own to ensure this. 

From here, we could continue treating this like a dynamic-programming problem. 
We can simplify our lives, however. 


Theorem 

Let Sjj fk 0, and let a,„ be the activity in Sjj with the earliest finish time: f m = 
min {f k ■ ak e Sjj}. Then: 

1. a m is used in some maximum-size subset of mutually compatible activities 
of Sjj. 

2. S im = 0, so that choosing ct m leaves S m j as the only nonempty subproblem. 
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Proof 

2. Suppose there is some a k € S im . Then f < s k < f k < s m < f m =>• f k < f m . 
Then a k e S,j and it has an earlier finish time than f m , which contradicts our 
choice of a>„. Therefore, there is no a k e S„„ => S„„ = 0. 

1. Let Ajj be a maximum-size subset of mutually compatible activities in Sjj. 
Order activites in A (/ in monotonically increasing order of finish time. 

Let a k be the first activity in A,,-. 

If a k = a m , done (a m is used in a maximum-size subset). 

Otherwise, construct A'. . = A,j — {«*} U {a m } (replace a k by a m ). 

Claim 

Activities in A -• are disjoint. 

Proof Activities in A, ; - are disjoint, a k is the first activity in A, ; - to finish, 
fm < fk (so a m doesn’t overlap anything else in A-). ■ (claim) 

Since | AT | = | A,y | and Ajj is a maximum-size subset, so is AT. ■ (theorem) 

This is great: 

before theorem after theorem 

• of subproblems in optimal solution 2 1 

• of choices to consider j — i — 1 1 

Now we can solve top down : 

• To solve a problem S (/ , 

• Choose a m e .S',, with earliest finish time: the greedy choice. 

• Then solve S mj . 

What are the subproblems? 

• Original problem is So,«+i- 

• Suppose our first choice is a mi . 

• Then next subproblem is S mun+ 1 . 

• Suppose next choice is a,„ 2 . 

• Next subproblem is S„ !2 , )!+ i. 

• And so on. 

Each subproblem is S mitH+ 1 , i.e., the last activities to finish. 

And the subproblems chosen have finish times that increase. 

Therefore, we can consider each activity just once, in monotonically increasing 
order of finish time. 
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Easy recursive algorithm: Assumes activites already sorted by monotonically in¬ 
creasing finish time. (If not, then sort in 0(n Ig n) time.) Return an optimal solu¬ 
tion for S Un+x : 

[The first two printings had a procedure that purported to return an optimal solution 
for Sjj, where j > i. This procedure had an error: it worked only when j — n + 1. 
It turns out that it was called only with j = n + I, however. To avoid this problem 
altogether, the procedure was changed to the following in the third printing.] 

Rec-Activity-Selector©, /, i. n ) 
m i + 1 

while m < n and s m < /, > Find first activity in 

do m m + 1 
if m < n 

then return {«,„} U Rec-Activity-Selector©, /, m, n ) 

else return 0 

Initial call: Rec-Activity-Selector©, /, 0, n). 

Idea: The while loop checks a, + i , a,+ 2 , ■ ■ ■ ,a n until it finds an activity a m that is 
compatible with a,- (need s m > fi )• 

• If the loop terminates because a m is found (m < n), then recursively solve 
S m „ + i, and return this solution, along with a m . 

• If the loop never finds a compatible a,,, ( m > n), then just return empty set. 

Go thi'ough example given earlier. Should get{«i, a 4 , a 8 , flu}. 

Time: ©(« )—each activity examined exactly once. 

Can make this iterative. It’s already almost tail recursive. 

Greedy-Activity-Selector©, /, n) 

A {fl© 
i 1 

for m *r- 2 to n 
do if s m > ft 

then A <- A U [a m } 

i <— m > a, is most recent addition to A 

return A 

Go through example given earlier. Should again getjfli, a 4 , a 8 , ah}. 

Time: 0(«). 


Greedy strategy 

The choice that seems best at the moment is the one we go with. 
What did we do for activity selection? 
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1. Determine the optimal substructure. 

2. Develop a recursive solution. 

3. Prove that at any stage of recursion, one of the optimal choices is the greedy 
choice. Therefore, it’s always safe to make the greedy choice. 

4. Show that all but one of the subproblems resulting from the greedy choice are 
empty. 

5. Develop a recursive greedy algorithm. 

6. Convert it to an iterative algorithm. 

At first, it looked like dynamic programming. 

Typically, we streamline these steps. 

Develop the substructure with an eye toward 

• making the greedy choice, 

• leaving just one subproblem. 

For activity selection, we showed that the greedy choice implied that in §j , only i 
varied, and j was fixed at n + 1 . 

We could have started out with a greedy algorithm in mind: 

• Define S, = {a* e S : f < Sk). 

• Then show that the greedy choice—first a m to finish in S,—combined with 
optimal solution to S m => optimal solution to S ( . 

Typical streamlined steps: 

1. Cast the optimization problem as one in which we make a choice and are left 
with one subproblem to solve. 

2. Prove that there’s always an optimal solution that makes the greedy choice, so 
that the greedy choice is always safe. 

3. Show that greedy choice and optimal solution to subproblem => optimal solu¬ 
tion to the problem. 

No general way to tell if a greedy algorithm is optimal, but two key ingredients are 

1. greedy-choice property and 

2. optimal substructure. 

Greedy-choice property 

A globally optimal solution can be arrived at by making a locally optimal (greedy) 
choice. 

Dynamic programming: 

• Make a choice at each step. 

• Choice depends on knowing optimal solutions to subproblems. Solve subprob¬ 
lems first. 

• Solve bottom-up. 
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Greedy: 

• Make a choice at each step. 

• Make the choice before solving the subproblems. 

• Solve top-down. 

Typically show the greedy-choice property by what we did for activity selection: 

• Look at a globally optimal solution. 

• If it includes the greedy choice, done. 

• Else, modify it to include the greedy choice, yielding another solution that’s 
just as good. 

Can get efficiency gains from greedy-choice property. 

• Preprocess input to put it into greedy order. 

• Or, if dynamic data, use a priority queue. 

Optimal substructure 

Just show that optimal solution to subproblem and greedy choice =>• optimal solu¬ 
tion to problem. 

Greedy vs. dynamic programming 

The knapsack problem is a good example of the difference. 

0-1 knapsack problem: 

• n items. 

• Item i is worth $tweighs w, pounds. 

• Find a most valuable subset of items with total weight < W. 

• Have to either take an item or not take it—can’t take paid of it. 

Fractional knapsack problem: Like the 0-1 knapsack problem, but can take frac¬ 
tion of an item. 

Both have optimal substructure. 

But the fractional knapsack problem has the greedy-choice property, and the 0-1 
knapsack problem does not. 

To solve the fractional problem, rank items by value/weight: q/w,-. 

Let Vi/ujj > Vi + \/wi + \ for all i. 

Fractional-Knapsack (v, w,W) 
load <— 0 
i <— 1 

while load < W and i < n 
do if Wi < W — load 

then take all of item i 
else take (W — load)/wi of item i 
add what was taken to load 
i <- i + 1 
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Time: 0(n Ig n) to sort, 0(n) thereafter. 

Greedy doesn’t work for the 0-1 knapsack problem. Might get empty space, which 
lowers the average value per pound of the items taken. 


i 

1 

2 

3 

Vi 

60 

100 

120 

Wi 

10 

20 

30 

Vi/Wj 

6 

5 

4 


Ik = 50. 

Greedy solution: 

• Take items 1 and 2. 

• value = 160, weight = 30. 

Have 20 pounds of capacity left over. 
Optimal solution: 

• Take items 2 and 3. 

• value = 220, weight = 50. 

No leftover capacity. 
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Solution to Exercise 16.1-2 

The proposed approach—selecting the last activity to start that is compatible with 
all previously selected activities—is really the greedy algorithm but starting from 
the end rather than the beginning. 

Another way to look at it is as follows. We are given a set S = {«i, a 2 , • • •, ««} 
of activities, where a, = [ 57 , /,), and we propose to find an optimal solution by 
selecting the last activity to start that is compatible with all previously selected 

activities. Instead, let us create a set S' = {a \, a' 2 . a ' n }, where a[ = [ f), .v,). 

That is, a- is «, in reverse. Clearly, a subset of {«,,, a, 2 ,..., } C S is mutually 

compatible if and only if the corresponding subset {a' f ,a' h ,..., a- } C S' is also 
mutually compatible. Thus, an optimal solution for S maps directly to an optimal 
solution for S' and vice versa. 

The proposed approach of selecting the last activity to start that is compatible with 
all previously selected activities, when run on S, gives the same answer as the 
greedy algorithm from the text—selecting the first activity to finish that is com¬ 
patible with all previously selected activities—when run on S. The solution that 
the proposed approach finds for S corresponds to the solution that the text’s greedy 
algorithm finds for S', and so it is optimal. 


Solution to Exercise 16.1-3 

Let S be the set of n activities. 

The “obvious” solution of using Greedy-Activity-Selector to find a maxi- 
mum-size set Si of compatible activities from S for the first lecture hall, then using 
it again to find a maximum-size set S 2 of compatible activities from S — Si for the 
second hall, (and so on until all the activities are assigned), requires (~)(it) time in 
the worst case. 

There is a better algorithm, however, whose asymptotic time is just the time needed 
to sort the activities by time— 0(n lg n) time for arbitrary times, or possibly as fast 
as O(n) if the times are small integers. 

The general idea is to go through the activities in order of start time, assigning 
each to any hall that is available at that time. To do this, move through the set 
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of events consisting of activities starting and activities finishing, in order of event 
time. Maintain two lists of lecture halls: Halls that are busy at the current event¬ 
time t (because they have been assigned an activity i that started at ,y < t but 
won’t finish until f ( > t) and halls that are free at time t. (As in the activity- 
selection problem in Section 16.1, we are assuming that activity time intervals are 
half open—i.e., that if y > /), then activities i and j are compatible.) When t 
is the start time of some activity, assign that activity to a free hall and move the 
hall from the free list to the busy list. When t is the finish time of some activity, 
move the activity’s hall from the busy list to the free list. (The activity is certainly 
in some hall, because the event times are processed in order and the activity must 
have started before its finish time t, hence must have been assigned to a hall.) 

To avoid using more halls than necessary, always pick a hall that has already had 
an activity assigned to it, if possible, before picking a never-used hall. (This can be 
done by always working at the front of the free-halls list—putting freed halls onto 
the front of the list and taking halls from the front of the list—so that a new hall 
doesn’t come to the front and get chosen if there are previously-used halls.) 

This guarantees that the algorithm uses as few lecture halls as possible: The algo¬ 
rithm will terminate with a schedule requiring m < n lecture halls. Let activity i 
be the first activity scheduled in lecture hall m. The reason that i was put in the 
/nth lecture hall is that the first m — 1 lecture halls were busy at time sj- So at this 
time there are m activities occurring simultaneously. Therefore any schedule must 
use at least m lecture halls, so the schedule returned by the algorithm is optimal. 

Run time: 

• Sort the 2 n activity-starts/activity-ends events. (In the sorted order, an activity¬ 
ending event should precede an activity-starting event that is at the same time.) 
0(n lg n) time for arbitrary times, possibly 0(n) if the times are restricted (e.g., 
to small integers). 

• Process the events in O(n) time: Scan the 2n events, doing 0(1) work for each 
(moving a hall from one list to the other and possibly associating an activity 
with it). 

Total: 0(n + time to sort) 

[The idea of this algorithm is related to the rectangle-overlap algorithm in Exer¬ 
cise 14.3-7.] 


Solution to Exercise 16.1-4 

• For the approach of selecting the activity of least duration from those that are 
compatible with previously selected activities: 


i 

1 

2 

3 

Si 

0 

2 

"3 

fi 

3 

4 

6 

duration 

3 

2 

3 


This approach selects just {« 2 }, but the optimal solution selects {ci\, « 3 }. 
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• For the approach of always selecting the compatible activity that overlaps the 
fewest other remaining activities: 


/ 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

Si 

0 

1 

1 

1 

2 

3 

4 

5 

5 

5 

6 

fi 

2 

3 

3 

3 

4 

5 

6 

7 

7 

7 

8 

# of overlapping activities 

3 

4 

4 

4 

4 

2 

4 

4 

4 

4 

3 


This approach first selects and after that choice it can select only two other 
activities (one of ci\, a. 2 , « 3 , flq and one of a 8 , a 9 , aw, flu). An optimal solution 
is {«i, « 5 , a 7 , flu}. 

• For the approach of always selecting the compatible remaining activity with 
the earliest start time, just add one more activity with the interval [0, 14) to 
the example in Section 16.1. It will be the first activity selected, and no other 
activities are compatible with it. 


Solution to Exercise 16.2-2 


The solution is based on the optimal-substructure observation in the text: Let i 
be the highest-numbered item in an optimal solution S for W pounds and items 
1 ,,n. Then S' = S — {/} must be an optimal solution for W — in, pounds 
and items 1,...,/ — 1, and the value of the solution S is p plus the value of the 
subproblem solution S'. 

We can express this relationship in the following formula: Define c[i, w | to be the 
value of the solution for items 1,...,/ and maximum weight in. Then 


c[i, in] 


0 if i = 0 or in = 0 , 

c\i — 1 , in] if Wj > in , 

max(u, + c[i — 1, in — w,], c[i — 1, in]) if / >0 and in > in, . 


The last case says that the value of a solution for / items either includes item /, 
in which case it is v, plus a subproblem solution for / — 1 items and the weight 
excluding in,, or doesn’t include item /, in which case it is a subproblem solution 
for / — 1 items and the same weight. That is, if the thief picks item /, he takes p 

value, and he can choose from items 1./ — 1 up to the weight limit in — iq, 

and get c[i — 1, in — in,] additional value. On the other hand, if he decides not to 
take item /, he can choose from items 1, ...,/ — 1 up to the weight limit in, and 
get c\i — 1, in] value. The better of these two choices should be made. 

The algorithm takes as inputs the maximum weight W, the number of items n, and 
the two sequences v = (iq, t>2, ..., v n ) and in = (ini, m2, ..., in„). It stores the 
c\i, j | values in a table c[0 .. n, 0 .. W] whose entries are computed in row-major 
order. (That is, the first row of c is filled in from left to right, then the second row, 
and so on.) At the end of the computation, c[n, W] contains the maximum value 
the thief can take. 
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Dynamic-0-1-Knapsack(i>, w, n, IV) 

for w <- 0 to W 
do c[0, w] +- 0 
for / *- 1 to n 
do c[i, 0] <— 0 

for w 1 to W 
do if Wi < w 

then if u, + c[i — 1, w — w,] > c\i — 1, w] 
then c[i, w] v, + c[i — 1, w — Wj | 
else c[i, w] c[i — 1 , in] 
else c[i, w] c\i — 1, in] 

The set of items to take can be deduced from the c table by starting at c\n, W] and 
tracing where the optimal values came from. If c\i, in \ = c[i — 1, w \ , then item i is 
not pai't of the solution, and we continue tracing with c\i ~ I. w \. Otherwise item i 
is paid of the solution, and we continue tracing with c\i — 1, w — Wj]. 

The above algorithm takes ®(nW ) time total: 

• 0(» IV) to fill in the c table: (n + 1) • (IV + 1) entries, each requiring 0(1) time 
to compute. 

• O(n ) time to trace the solution (since it starts in row n of the table and moves 
up one row at each step). 


Solution to Exercise 16.2-4 

The optimal strategy is the obvious greedy one. Starting will a full tank of gas, 
Professor Midas should go to the farthest gas station he can get to within n miles 
of Newark. Fill up there. Then go to the farthest gas station he can get to within n 
miles of where he filled up, and fill up there, and so on. 

Looked at another way, at each gas station, Professor Midas should check whether 
he can make it to the next gas station without stopping at this one. If he can, skip 
this one. If he cannot, then fill up. Professor Midas doesn’t need to know how 
much gas he has or how far the next station is to implement this approach, since at 
each fillup, he can determine which is the next station at which he’ll need to stop. 

This problem has optimal substructure. Suppose there are m possible gas stations. 
Consider an optimal solution with s stations and whose first stop is at the kth gas 
station. Then the rest of the optimal solution must be an optimal solution to the 
subproblem of the remaining m — k stations. Otherwise, if there were a better 
solution to the subproblem, i.e., one with fewer than s — 1 stops, we could use it to 
come up with a solution with fewer than s stops for the full problem, contradicting 
our supposition of optimality. 

This problem also has the greedy-choice property. Suppose there are k gas stations 
beyond the start that are within n miles of the start. The greedy solution chooses 
the Ath station as its first stop. No station beyond the Ath works as a first stop, 
since Professor Midas runs out of gas first. If a solution chooses a station j < A as 
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its first stop, then Professor Midas could choose the kth station instead, having at 
least as much gas when he leaves the kth station as if he’d chosen the jth station. 
Therefore, he would get at least as far without filling up again if he had chosen the 
kth station. 

If there are m gas stations on the map, Midas needs to inspect each one just once. 
The running time is 0(m). 


Solution to Exercise 16.2-6 

Use a linear-time median algorithm to calculate the median m of the x\/u ra¬ 
tios. Next, partition the items into three sets: G — {i : Vj/w, > m } , E = 
{i : Vi/wj = m }, and L = {/ : u ,■/?/), < m }; this step takes linear time. Compute 
Wg = E/ eG w i and W E = E/gfi Wj, the total weight of the items in sets G and E, 
respectively. 

• \fW G > W, then do not yet take any items in set G, and instead recurse on the 
set of items G and knapsack capacity W. 

• Otherwise (W G < W), take all items in set G, and take as much of the items in 
set E as will fit in the remaining capacity W — W G . 

• If W G + We > W (i.e., there is no capacity left after taking all the items in 
set G and all the items in set E that fit in the remaining capacity W — W G ), then 
we are done. 

• Otherwise (W G + We < W), then after taking all the items in sets G and E, 
recurse on the set of items L and knapsack capacity W — W G — We- 

To analyze this algorithm, note that each recursive call takes linear time, exclusive 
of the time for a recursive call that it may make. When there is a recursive call, there 
is just one, and it’s for a problem of at most half the size. Thus, the running time is 
given by the recurrence T(n) < T(n/2) + ©(«), whose solution is T(n) = 0(n). 


Solution to Exercise 16.2-7 

Sort A and B into monotonically decreasing order. 

Here’s a proof that this method yields an optimal solution. Consider any indices i 
and j such that i < j, and consider the terms cti bi and (ij bj . We want to show that 
it is no worse to include these terms in the payoff than to include c\ bj and a b \ i.e., 
that a bi a b i > a b iaj bi . Since A and B are sorted into monotonically decreasing 
order and i < j, we have «,• > ctj and /;, > bj. Since a, and cij are positive 
and b, — bj is nonnegative, we have a bi ~ b i > a bi ~ b i . Multiplying both sides by 
a b > a j b ‘ yields a bi a b i > a b iaj bi . 

Since the order of multiplication doesn’t matter, sorting A and B into monotoni¬ 
cally increasing order works as well. 
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Solution to Exercise 16.4-2 

We need to show three things to prove that ( S , 1) is a matroid: 

1. 5 is finite. That’s because S is the set of of m columns of matrix T. 

2. 1 is hereditary. That’s because if Bel, then the columns in B are linearly in¬ 
dependent. If A c B, then the columns of A must also be linearly independent, 
and so A e 1. 

3. ( S , I) satisfies the exchange property. To see why, let us suppose that A, B e 1 
and |A| < |B|. 

We will use the following properties of matrices: 

• The rank of a matrix is the number of columns in a maximal set of linearly 
independent columns (see page 731 of the text). The rank is also equal to the 
dimension of the column space of the matrix. 

• If the column space of matrix B is a subspace of the column space of ma¬ 
trix A, then rank(fi) < rank(A). 

Because the columns in A are linearly independent, if we take just these 
columns as a matrix A, we have that rank(A) = |A|. Similarly, if we take 
the columns of B as a matrix B, we have rank(fi) = |B|. Since |A| < |5|, we 
have rank (A) < rank( B). 

We shall show that there is some column b e B that is not a linear combination 
of the columns in A, and so A U {/;} is linearly independent. The proof proceeds 
by contradiction. Assume that each column in B is a linear combination of 
the columns of A. That means that any vector that is a linear combination 
of the columns of B is also a linear combination of the columns of A, and 
so, treating the columns of A and B as matrices, the column space of B is a 
subspace of the column space of A. By the second property above, we have 
rank(fi) < rank(A). But we have already shown that rank(A) < rank(6), a 
contradiction. Therefore, some column in B is not a linear combination of the 
columns of A, and ( S , 1) satisfies the exchange property. 


Solution to Exercise 16.4-3 

We need to show three things to prove that (, S , T) is a matroid: 

1. S is finite. We are given that. 

2. !' is hereditary. Suppose that B' e i' and A' C B'. Since B' e I', there is 
some maximal set B e I such that B C S — B'. But A! C B' implies that 
S — B' c S — A', and so B C S — B' C S — A'. Thus, there exists a maximal 
set B e 1 such that B c S — A' , proving that A' e 1'. 

3. ( S, 1') satisfies the exchange property. We start with two preliminary facts 
about sets. The proofs of these facts are omitted. 
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Fact 1: \X-Y\ = \X\ - |Xn7|. 

Fact 2: Let S be the universe of elements. If X — Y C Z and Z C 5 — Y , then 

ixnzi = |X| - |xny|. 

To show that ( S , 1') satisfies the exchange property, let us assume that A e 1 ', 
B' e 1', and that \A'\ < \B’\. We need to show that there exists some x e 
B' — A such that AU{i} E 1'. Because A e 1' and B' E there are maximal 
sets A c S — A and B c S — B' such that AeI and Bel. 

Define the set X = B' — A — A, so that X consists of elements in B but not in 
A or A. 

If X is nonempty, then let x be any element of X. By how we defined set X, we 
know that x e B' and x ^ A, so that x e B' — A. Since x A, we also have 
that A C S — A — {.x} = S — (A U {.v}), and so A U {.v} e 1'. 

If X is empty, the situation is more complicated. Because \ A\ < \B'\, we have 
that B' — A 7 ^ 0, and so X being empty means that B' — A C A. 

Claim 

There is an element y e B — A such that (A — B') U {y} e 1. 

Proof First, observe that because A—B r C A and AeI, we have that A — B' e 
1. Similarly, B — A C B and Bel, and so B — A e 1. If we show 
that | A — B '| < |B — A |, the assumption that ( S , 1) is a matroid proves the 
existence of y. 

Because B' — A c A and A Q S — A, we can apply Fact 2 to conclude 
that |fl'nA| = |fl'| - |5'n A'|. We claim that |B n A'| < \A - B'\. To 
see why, observe that A — B' = A n (S — B') and B c S — B ', and so 
B n A C (S - fl') n A' = A' n (S - B') = A' - B'. Applying Fact 1, we 
see that | A - B'\ = | A'| - |A' n B'\ = |A'| - \B' n A'|, and hence \B n A'| < 

I A'| - |fl'n A'|. 


Now, we have 




|A'| 

< 

1 B' 

1 

(by assumption) 


\A'\ 

- 1 B' 

n a'| 

< 

1 B’ 

1 - |fl'n A'| 

(subtracting same quantity) 


1 B 

n a'| 

< 

1 B' 

1 - |5'n A'| 

(|5n A'l < |A'| 

- |fl'n A'|) 


1 B 

n a'| 

< 

1 B’ 

n A| 

(Ifl'n A| = \B'\ 

- |B'n A'|) 

\B\ 

-1 B 

n a'| 

> 

|A| 

- |5'n A| 

(\A\ = |fl|) 



1 B 

- A'| 

> 

\A ■ 

— B'\ 

(Fact 1) 

■ (claim) 


Now we know there is an element y e B — A such that (A — B') U {y} e 1. 
Moreover, we claim that y ^ A. To see why, we know that by the exchange 
property, y A — B'. In order for y to be in A, it would have to be in A n B!. 
But y e B, which means that y ^ B' , and hence y f A (T B'. Therefore y ^ A. 

We keep applying the exchange property, adding elements in B — A to A — B', 
maintaining that the set we get is in 1. Continue adding these elements until we 
get a set, say C, such that |C| = |A|. Once |C| = |A|, there is some element 
x e A that we have not added into C. We know this because the element y that 
we first added into C was not in A, and so some element of A must be left over. 
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The set C is maximal, because it has the same cardinality as A, which is maxi¬ 
mal, and Cel. Since C started with all elements in A —B' and we added only 
elements in B — A', at no time did C receive an element in A. Because we also 
never added x to C, we have that C c S - A' - (i) = X - (A' U jr}), which 
proves that A' U {x} e 1', as we needed to show. 


Solution to Problem 16-1 

Before we go into the various parts of this problem, let us first prove once and for 
all that the coin-changing problem has optimal substructure. 

Suppose we have an optimal solution for a problem of making change for n cents, 
and we know that this optimal solution uses a coin whose value is c cents; let this 
optimal solution use k coins. We claim that this optimal solution for the problem 
of n cents must contain within it an optimal solution for the problem of n — c cents. 
We use the usual cut-and-paste argument. Clearly, there are k — I coins in the 
solution to the n — c cents problem used within our optimal solution to the n cents 
problem. If we had a solution to the n — c cents problem that used fewer than k — 1 
coins, then we could use this solution to produce a solution to the n cents problem 
that uses fewer than k coins, which contradicts the optimality of our solution. 

a. A greedy algorithm to make change using quarters, dimes, nickels, and pennies 
works as follows: 

• Give q = [n/25\ quarters. That leaves n q = n mod 25 cents to make 
change. 

• Then give d = \_n q / l()J dimes. That leaves tig = n q mod 10 cents to make 
change. 

• Then give k = |_« d /5J nickels. That leaves n k = tig mod 5 cents to make 
change. 

• Finally, give p = n k pennies. 

An equivalent formulation is the following. The problem we wish to solve is 
making change for n cents. If n = 0, the optimal solution is to give no coins. 
If n > 0, determine the largest coin whose value is less than or equal to n. 
Let this coin have value c. Give one such coin, and then recursively solve the 
subproblem of making change for n — c cents. 

To prove that this algorithm yields an optimal solution, we first need to show 
that the greedy-choice property holds, that is, that some optimal solution to 
making change for n cents includes one coin of value c, where c is the largest 
coin value such that c < n. Consider some optimal solution. If this optimal 
solution includes a coin of value c, then we are done. Otherwise, this optimal 
solution does not include a coin of value c. We have four cases to consider: 

• Ifl < n < 5, then c = 1. A solution may consist only of pennies, and so it 
must contain the greedy choice. 

• If 5 < n < 10, then c = 5. By supposition, this optimal solution does not 
contain a nickel, and so it consists of only pennies. Replace five pennies by 
one nickel to give a solution with four fewer coins. 
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• If 10 < n < 25, then c = 10. By supposition, this optimal solution does not 
contain a dime, and so it contains only nickels and pennies. Some subset of 
the nickels and pennies in this solution adds up to 10 cents, and so we can 
replace these nickels and pennies by a dime to give a solution with (between 
1 and 9) fewer coins. 

• If 25 < n, then c = 25. By supposition, this optimal solution does not 
contain a quarter, and so it contains only dimes, nickels, and pennies. If 
it contains three dimes, we can replace these three dimes by a quarter and 
a nickel, giving a solution with one fewer coin. If it contains at most two 
dimes, then some subset of the dimes, nickels, and pennies adds up to 25 
cents, and so we can replace these coins by one quarter to give a solution 
with fewer coins. 

Thus, we have shown that there is always an optimal solution that includes the 
greedy choice, and that we can combine the greedy choice with an optimal solu¬ 
tion to the remaining subproblem to produce an optimal solution to our original 
problem. Therefore, the greedy algorithm produces an optimal solution. 

For the algorithm that chooses one coin at a time and then recurses on sub¬ 
problems, the running time is &(k), where k is the number of coins used in an 
optimal solution. Since k < n, the running time is O(n). For our first descrip¬ 
tion of the algorithm, we perform a constant number of calculations (since there 
are only 4 coin types), and the running time is 0(1). 

b. When the coin denominations are c°, c 1 , ..., c k , the greedy algorithm to make 
change for n cents works by finding the denomination d such that j = 
max{0 < i < k : c l < n], giving one coin of denomination d , and recurs¬ 
ing on the subproblem of making change for n — d cents. (An equivalent, 
but more efficient, algorithm is to give \_n / c k \ coins of denomination <f and 
[(« mod c ,+1 )/c'J coins of denomination d for i = 0, 1, ..., k — 1.) 

To show that the greedy algorithm produces an optimal solution, we start by 
proving the following lemma: 

Lemma 

For i = 0, 1, let a,- be the number of coins of denomination d used in 

an optimal solution to the problem of making change for n cents. Then for 
i = 0, 1, ..., k — 1, we have a, < c. 

Proof If a, > c for some 0 < i < k, then we can improve the solution by using 
one more coin of denomination d +] and c fewer coins of denomination d. The 
amount for which we make change remains the same, but we use c — 1 > 0 
fewer coins. ■ (lemma) 

To show that the greedy solution is optimal, we show that any non-greedy so¬ 
lution is not optimal. As above, let j = max{0 < i < k : d < n), so that the 
greedy solution uses at least one coin of denomination d. Consider a non- 
greedy solution, which must use no coins of denomination d or higher. Let the 
non-greedy solution use a, coins of denomination d, for i =0.1,...,/ — I; 
thus we have X^/=o a i c ' = n • Since n > cf we have that ^/=o a i c ‘ — c ’■ 



16-18 


Solutions for Chapter 16: Greedy Algorithms 


Now suppose that the non-greedy solution is optimal. By the above lemma, 
a,- < c — 1 for i — 0, 1, ..., j — 1. Thus, 


i -i i 

y ci,c i < Yu: ~ i )c i 

i=0 i=0 

j -1 

= (c-l)p 
;'=0 

C j ~ 1 

= (c-1)-- 

c — 1 

= c 7 — 1 


< c 7 , 

which contradicts our earlier assertion that E/=o fl ' c ' ^ cJ ■ We conclude that 
the non-greedy solution is not optimal. 


Since any algorithm that does not produce the greedy solution fails to be opti¬ 
mal, only the greedy algorithm produces the optimal solution. 

The problem did not ask for the running time, but for the more efficient greedy- 
algorithm formulation, it is easy to see that the running time is 0(k), since we 
have to perform at most k each of the division, floor, and mod operations. 


c. With actual U.S. coins, we can use coins of denomination 1, 10, and 25. When 
n = 30 cents, the greedy solution gives one quarter and five pennies, for a total 
of six coins. The non-greedy solution of three dimes is better. 

The smallest integer numbers we can use are 1, 3, and 4. When n = 6 cents, the 
greedy solution gives one 4-cent coin and two 1-cent coins, for a total of three 
coins. The non-greedy solution of two 3-cent coins is better. 


d. Since we have optimal substructure, dynamic programming might apply. And 
indeed it does. 


Let us define c[j] to be the minimum number of coins we need to make change 
for j cents. Let the coin denominations be d\, c? 2 ,..., dg. Since one of the 
coins is a penny, there is a way to make change for any amount j > 1. 

Because of the optimal substructure, if we knew that an optimal solution for 
the problem of making change for j cents used a coin of denomination we 
would have c[j] = 1 + c[j — c/ ( |. As base cases, we have that c[j] = 0 for all 

j < o. 

To develop a recursive formulation, we have to check all denominations, giving 


c\j\ = 


0 

1 + min {c[j — d/]} 

1 <i<k 


if j < 0 , 

if; > 1 ■ 


We can compute the c\j] values in order of increasing j by using a table. The 
following procedure does so, producing a table c[l.. n\. It avoids even exam¬ 
ining c\ j | for j < 0 by ensuring that j > d, before looking up c[j — d, \. The 
procedure also produces a table denom[ 1 .. n], where denom[j ] is the denomi¬ 
nation of a coin used in an optimal solution to the problem of making change 
for j cents. 
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Compute-Change (n, d, k) 

for j 1 to n 

do c\j | <r- oo 

for i ^ 1 to k 

do if j > dj and 1 + c[j — c/ ( | < c[j] 
then c[j] <r- 1 + c\ j - d,\ 
denom[j ] 4- d, 

return c and denom 

This procedure obviously runs in 0(nk ) time. 

We use the following procedure to output the coins used in the optimal solution 
computed by Compute-Change: 

Give-Change(;', denom) 
if J >0 

then give one coin of denomination denom\ j | 

Give-Change (j — denom.) j ], denom) 

The initial call is Give-Change (n, denom). Since the value of the first pa¬ 
rameter decreases in each recursive call, this procedure runs in O(n) time. 
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Chapter 17 overview 

Amortized analysis 

• Analyze a sequence of operations on a data structure. 

• Goal: Show that although some individual operations may be expensive, on 
average the cost per operation is small. 

Average in this context does not mean that we’re averaging over a distribution of 
inputs. 

• No probability is involved. 

• We’re talking about average cost in the worst case. 

Organization 

We’ll look at 3 methods: 

• aggregate analysis 

• accounting method 

• potential method 

Using 3 examples: 

• stack with multipop operation 

• binary counter 

• dynamic tables (later on) 


Aggregate analysis 

Stack operations 

• Push(S, x ): 0(1) each => 0(n ) for any sequence of n operations. 

• Pop(S): 0(1) each =>• 0(n) for any sequence of n operations. 
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• MULTIPOP(S, k) 

while S is not empty and k > 0 
do POP(S) 
k <— k — 1 

Running time of Multipop: 

• Linear in # of POP operations. 

• Let each Push/Pop cost 1. 

• # of iterations of while loop is min(.v, k ), where s = # of objects on stack. 

• Therefore, total cost = minis, k). 

Sequence of n Push, Pop, Multipop operations: 

• Worst-case cost of Multipop is O(n). 

• Have n operations. 

• Therefore, worst-case cost of sequence is O(tr). 

Observation 

• Each object can be popped only once per time that it’s pushed. 

• Have < n PuSHes =>• < n Pops, including those in Multipop. 

• Therefore, total cost = O(n). 

• Average over the n operations =>• 0(1) per operation on average. 

Again, notice no probability. 

• Showed worst-case 0(n ) cost for sequence. 

• Therefore, 0(1) per operation on average. 

This technique is called aggregate analysis. 

Binary counter 

• k-bit binary counter A[0 .. k — 1] of bits, where A[0] is the least significant bit 
and A[k — 1] is the most significant bit. 

• Counts upward from 0. 

k -1 

• Value of counter is !>['']• 2‘- 

/=o 

• Initially, counter value is 0, so A[0 .. k — 1] = 0. 

• To increment, add 1 (mod 2 k ): 

Increment (A, k) 
i 

while i < k and A[i] = 1 
do A\i\ 4r- 0 
i <- i + 1 

if z < k 

then A[i] 1 
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Example: k = 3 

[Underlined bits dip. Show costs later.] 


counter A 
value 2 10 cost 


0 

000 

0 

1 

00 1 

1 

2 

0 1 0 

3 

3 

0 1 1 

4 

4 

100 

7 

5 

1 0J. 

8 

6 

1 1 0 

10 

7 

111 

11 

0 

000 

14 



15 


Cost of Increment = 0(# of bits flipped). 

Analysis: Each call could flip k bits, so n Increments takes 0(nk ) time. 

Observation 

Not every bit flips every time. 

[Show costs from above.] 

bit flips how often times in n Increments 


0 every time n 

1 1/2 the time |_n/2J 

2 1/4 the time \_n /4J 

i 1/2'the time L» /2'J 

i > k never 0 


k -1 

Therefore, total # of flips = |_«/2'J 

i =0 


1=0 

= "(r 2 ^) 

= 2 n . 


Therefore, n Increments costs O(n). 
Average cost per operation = 0(1). 
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Accounting method 

Assign different charges to different operations. 

• Some are charged more than actual cost. 

• Some are charged less. 

Amortized cost = amount we charge. 

When amortized cost > actual cost, store the difference on specific objects in the 
data structure as credit. 

Use credit later to pay for operations whose actual cost > amortized cost. 

Differs from aggregate analysis: 

• In the accounting method, different operations can have different costs. 

• In aggregate analysis, all operations have same cost. 

Need credit to never go negative. 

• Otherwise, have a sequence of operations for which the amortized cost is not 
an upper bound on actual cost. 

• Amortized cost would tell us nothing. 

Let Cj = actual cost of ith operation , 
c) = amortized cost of i th operation . 

n n 

Then require Q > E c/ for all sequences of n operations. 

/= 1 i =1 

n n 

Total credit stored = E^-E a > 0. 

had better be 


Stack 


operation 

actual cost 

amortized cost 

Push 

1 

2 

Pop 

1 

0 

Multipop 

min(T, s) 

0 


Intuition: When pushing an object, pay $2. 

• $1 pays for the PUSH. 

• $1 is prepayment for it being popped by either Pop or Multipop. 

• Since each object has $1, which is credit, the credit can never go negative. 

• Therefore, total amortized cost, = O(n), is an upper bound on total actual cost. 
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Binary counter 

Charge $2 to set a bit to 1. 

• $1 pays for setting a bit to 1. 

• $1 is prepayment for flipping it back to 0. 

• Have $1 of credit for every 1 in the counter. 

• Therefore, credit > 0. 

Amortized cost of INCREMENT: 

• Cost of resetting bits to 0 is paid by credit. 

• At most 1 bit is set to 1. 

• Therefore, amortized cost < $2. 

• For n operations, amortized cost = 0(n). 


Potential method 

Like the accounting method, but think of the credit as potential stored with the 
entire data structure. 

• Accounting method stores credit with specific objects. 

• Potential method stores potential in the data structure as a whole. 

• Can release potential to pay for future operations. 

• Most flexible of the amortized analysis methods. 

Let D, = data structure after i th operation , 

D 0 = initial data structure , 

Ci = actual cost of i th operation , 
cj = amortized cost of/th operation . 

Potential function d> : D, —► R 

<f>( Dj ) is the potential associated with data structure D,. 

Cj = c, + dHA) — d>(A_i) 

= C/ + A£(A). 

increase in potential due to / th operation 

n 

Total amortized cost = cj 

i=i 

n 

= + O(A) - d>(A_!)) 

i=i 

(telescoping sum: every term other than Do and D n 
is added once and subtracted once) 

n 

= + <&(£>„)-$ (Do). 
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If we require that <t> (D,) > <P(D 0 ) for all i, then the amortized cost is always an 
upper bound on actual cost. 

In practice: d>(D 0 ) = 0, <f> (D,) > 0 for all i. 

Stack 

cp = # of objects in stack 

(= # of $1 bills in accounting method) 

D 0 = empty stack =+ 3>(D 0 ) = 0. 

Since # of objects in stack is always > 0, + ( D ,) > 0 = ( P(D () ) for all i. 


operation 

actual cost 

A 

amortized cost 

Push 

1 

(5 + 1) -5 = 1 

where s = # of objects initially 

1 + 1=2 

Pop 

1 

(s - 1) = -1 

1-1=0 

Multipop 

k' = min (A:, s) 

1 

II 

1 

1 

>2 

o 

II 

+ 

1 

+ 


Therefore, amortized cost of a sequence of n operations = O(n). 

Binary counter 

= bi = # of l’s after ith INCREMENT 
Suppose ith operation resets /, bits to 0. 

Ci < ti + 1 (resets t ,• bits, sets < 1 bit to 1) 

• If b, = 0, the ith operation reset all k bits and didn’t set one, so 

bi—i — t[ — k bj — b (—i tj. 

• If bi > 0, the ith operation reset /, bits, set one, so 
b- t = b ,-1 - ti + 1. 

• Either way, bi < /;,_i — tj + 1. 

• Therefore, 

A<D(A) < (bi-, - ti + 1) - bi-! 

= 1 -ti. 

Ci = Ci + Ad>(D/) 

< (t{ + 1) + (1 — tj) 

= 2 . 

If counter starts at 0, 0( D 0 ) = 0. 

Therefore, amortized cost of n operations = O(n). 


Dynamic tables 


A nice use of amortized analysis. 
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Scenario 

• Have a table—maybe a hash table. 

• Don’t know in advance how many objects will be stored in it. 

• When it fills, must reallocate with a larger size, copying all objects into the new, 
larger table. 

• When it gets sufficiently small, might want to reallocate with a smaller size. 
Details of table organization not important. 

Goals 

1. 0(1) amortized time per operation. 

2. Unused space always < constant fraction of allocated space. 

Load factor a = num/size, where num = # items stored, size = allocated size. 

If size = 0, then num = 0. Call a — 1. 

Never allow a > 1. 

Keep a > a constant fraction => goal (2). 


Table expansion 

Consider only insertion. 

• When the table becomes full, double its size and reinsert all existing items. 

• Guarantees that a > 1/2. 

• Each time we actually insert an item into the table, it’s an elementary insertion. 

Table-Insert (T, x) 

if size[T] = 0 

then allocate table[T] with 1 slot 
size[T] 4- 1 

if man[T] = size[T] D> expand? 

then allocate new-table with 2 • size[T] slots 

insert all items in table[T ] into new-table \> num[T] elem insertions 

free table [T] 

table[T ] 4— new-table 

size[T] 4- 2 • size[T] 

insert x into table] T \ D> 1 elem insertion 

num[T] 4— num[T ] + 1 


Initially, num[T ] = size] T ] = 0. 
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Running time: Charge 1 per elementary insertion. Count only elementary inser¬ 
tions, since all other costs together are constant per call. 

Cj = actual cost of i th operation 

• If not full, a — 

• If full, have i — 1 items in the table at the start of the /th operation. Have to 
copy all / — 1 existing items, then insert /th item => q = i. 

n operations => q = 0(n ) => 0(n 2 ) time for n operations. 

Of course, we don’t always expand: 

{ / if / — 1 is exact power of 2 , 

1 otherwise . 

n 

Total cost = Ci 

?=i 

Llg «| 

< n + 2 7 


/=o 

2L>g«J+i _ [ 



< n + 2 n 
= 3 n 


Therefore, aggregate analysis says amortized cost per operation = 3. 

Accounting method 

Charge $3 per insertion of x. 

• $ 1 pays for x’s insertion. 

• $1 pays for v to be moved in the future. 

• $ 1 pays for some other item to be moved. 

Suppose we’ve just expanded, size = m before next expansion, size = 2m after 

next expansion. 

• Assume that the expansion used up all the credit, so that there’s no credit stored 
after the expansion. 

• Will expand again after another m insertions. 

• Each insertion will put $1 on one of the m items that were in the table just after 
expansion and will put $1 on the item inserted. 

• Have $2m of credit by next expansion, when there are 2m items to move. Just 
enough to pay for the expansion, with no credit left over! 
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Potential method 


$>{T) = 2 • num[T ] — size[T] 

• Initially, num = size = ()=> + = 0. 

• Just after expansion, size = 2 • num =^0 = 0. 

• Just before expansion, size = num =^> O = num => have enough potential to 
pay for moving all items. 

• Need > 0, always. 

Always have 


size 

> num > 2 . => 


2 • num > size =>• 


O 

IV 

o 

Amortized cost ofith operation: 

numj = 

after i th operation , 

= 

size after i th operation , 

d>, = 

+ after i th operation . 


• If no expansion: 
size, = sizej-i , 
numj = numi-i +1 , 

Ci = 1 . 

Then we have 
Q = Q + <t> ( - 

= 1 + (2 • numj — sizet ) — (2 • «wm,_ i — sizei- 1 ) 

= 1 + (2 • numj — sizei ) — (2 (ram; —1) — size/) 

= 1+2 
= 3 . 

• If expansion: 

sizej = 2 • S7Z<?/_ 1 , 

size,-1 = num j-i = num, —1 , 

cy = numi-i +1 = numj . 

Then we have 

Cj = Cj + <£, + 

= numi + (2 • minii — sizei) — (2 • — sizet- 1 ) 

= num; + (2 • num, —2(numj —1)) — (2(mm ; —1) — (ram; — 1)) 
= mim, + 2 — (mim,—\) 

= 3 . 
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Expansion and contraction 

When a drops too low, contract the table. 

• Allocate a new, smaller one. 

• Copy all items. 

Still want 

• a bounded from below by a constant, 

• amortized cost per operation = 0(1). 

Measure cost in terms of elementary insertions and deletions. 

“Obvious strategy 

• Double size when inserting into a full table (when a = 1, so that after insertion 
a would become > 1). 

• Halve size when deletion would make table less than half full (when a = 1/2, 
so that after deletion a would become < 1 /2). 

• Then always have 1/2 < a < 1. 

• Suppose we fill table. 

Then insert =>■ double 

2 deletes => halve 
2 inserts => double 
2 deletes =>• halve 


Not performing enough operations after expansion or contraction to pay for the 
next one. 

Simple solution: 

• Double as before: when inserting with a — 1 =>• after doubling, a = 1/2. 

• Halve size when deleting with a = 1/4 =>• after halving, a = 1/2. 

• Thus, immediately after either expansion or contraction, have a = 1/2. 

• Always have 1 /4 < a < 1. 




Lecture Notes for Chapter 17: Amortized Analysis 


17-11 


Intuition: 

• Want to make sure that we perform enough operations between consecutive 
expansions/contractions to pay for the change in table size. 

• Need to delete half the items before contraction. 

• Need to double number of items before expansion. 

• Either way, number of operations between expansions/contractions is at least a 
constant fraction of number of items copied. 

, m _ f 2 • num[T] - size[T] if a > 1/2 , 

[s7Zc[T]/2 — num[T ] if a < 1/2 . 

T empty =>• d> = 0. 

a > 1/2 =>• num > 2 • size =>• 2 • num > size =>• > 0. 

a < 1/2 =>• num < / • size =>■ T> > 0. 

Intuition: <t> measures how far from a = 1/2 we are. 

• a = 1/2 => Q> = 2 ■ num —2 ■ num = 0. 

• a = 1 =>■ = 2 • num — num = num. 

• a = 1/4 = size /2 — nwm = 4 • num /2 — = nwm. 

• Therefore, when we double or halve, have enough potential to pay for moving 
all num items. 

• Potential increases linearly between a = 1/2 and a = 1, and it also increases 
linearly between a — 1/2 and a = 1/4. 

• Since ot has different distances to go to get to 1 or 1/4, starting from 1/2, rate 
of increase of differs. 

• For a to go from 1/2 to 1, num increases from size /2 to size, for a total 
increase of size /2. T> increases from 0 to size. Thus, <J> needs to increase 
by 2 for each item inserted. That’s why there’s a coefficient of 2 on the 
num[T ] term in the formula for <t> when a > 1/2. 

• For a to go from 1/2 to 1 /4, num decreases from size /2 to size /4, for a total 
decrease of size/A. $ increases from 0 to size /A. Thus, <T needs to increase 
by 1 for each item deleted. That’s why there’s a coefficient of —1 on the 
num[T ] term in the formula for <t> when a < 1/2. 

Amortized costs: more cases 

• insert, delete 

• a > 1/2, a < 1/2 (use a,-, since a can vary a lot) 

• size does/doesn’t change 

Insert: 

' of,-_i > 1/2, same analysis as before =>• q = 3. 

• oti -1 < 1/2 =>• no expansion (only occurs when cz,—i = 1). 
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• If ctj— j < 1/2 and o',- < 1/2: 

C; = Ci + <£>, + <£;_i 

= 1 + (i sizej /2 — numj) — (sizet-] /2 — numj- 1 ) 

= 1 + (size, /2 — nurtii) — (size, /2 — (ram; — 1)) 

= 0 . 

• Ifa;_i < 1/2 and a; > 1/2: 

q = 1 + (2 • — sizet) — (sizei-\ /2 — 

= 1 + (2(numj-\ +1) — J7ze,_i) — (s/ze,_i /2 — mm,- 1 ) 

3 

= 3 • muni— i — - • sizej-] +3 
3 

= 3 • o',--! i'/ze,—j — - • sizei-i +3 
3 3 

< - • -- • sizei-i +3 

= 3 . 

Therefore, amortized cost of insert is < 3. 

Delete: 

• Ifa,_i < 1/2, then o', < 1/2. 

• If no contraction: 

q = 1 + (sizej /2 — ram,-) — (sizej-] /2 — nwm,_i) 

= 1 + (size; /2 — ram,) — (sizet /2 — (nwm,- +1)) 

= 2 . 

• If contraction: 

q = (nwm; +1) + (sizei /2 — ram,-) — (j7ze,_i /2 — nuntj-i) 
move + delete 

[size, /2 = sizet-] /4 = num,-\ = nwm, +1] 

= (mm; +1) + ((nunij +1) — ram,) — ((2 • nwm, +2) — (numj +1)) 
= 1 . 

• If Q!;_| > 1/2, then no contraction. 

• If ctj > 1 /2: 

q = 1 + (2 • numj — sizet) — (2 • numj-] — sizej- 1 ) 

= 1 + (2 • ram,- — sizej) — (2 • nwm,- +2 — sizej) 

= -1 . 

• If a,- < 1/2, since a,_i > 1/2, have 

1 1 

fiumj = numj-] — 1 > - ■ sizej-] — 1 = - • sizej —1 . 
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Thus, 

ct = 1 + (sizei /2 — nutrij) — (2 • numi-\ — sizei- 1 ) 
= 1 + (sizei /2 — numi) — (2 • numi +2 — sizei) 
3 

= — 1 + - ■ sizei —3 • fiutTij 

3 /I \ 

< —1 + - • -3 I - • 5/ze,- -1 1 

= 2 . 

Therefore, amortized cost of delete is < 2. 



Solutions for Chapter 17: 
Amortized Analysis 


Solution to Exercise 17.1-3 


Let c,- = cost of ith operation. 

{ i if i is an exact power of 2 , 
1 otherwise . 


Operation 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 


Cost 

1 

2 
1 
4 
1 
1 
1 
8 
1 
1 


n operations cost 

n lg n 

Cj < n + 2 7 = n + ( 2 /i — 1 ) < 3n . 

i=l y=0 

(Note: Ignoring floor in upper bound of ^ 2 7 .) 

Average cost of operation = ^otal cost < 3 
& r # operations 

By aggregate analysis, the amortized cost per operation = 0(1). 


Solution to Exercise 17.2-1 

[We assume that the only way in which COPY is invoked is automatically, after 
every sequence ofk PUSH and POP operations.] 
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Charge $2 for each PUSH and POP operation and $0 for each COPY. When we call 
PUSH, we use $1 to pay for the operation, and we store the other $1 on the item 
pushed. When we call POP, we again use $1 to pay for the operation, and we store 
the other $1 in the stack itself. Because the stack size never exceeds k, the actual 
cost of a Copy operation is at most $k, which is paid by the $k found in the items in 
the stack and the stack itself. Since there are k PUSH and POP operations between 
two consecutive COPY operations, there are $k of credit stored, either on individual 
items (from PUSH operations) or in the stack itself (from POP operations) by the 
time a COPY occurs. Since the amortized cost of each operation is 0(1) and the 
amount of credit never goes negative, the total cost of n operations is Of/?). 


Solution to Exercise 17.2-2 


Let Ci = cost of /th operation. 

{ / if i is an exact power of 2 , 

1 otherwise . 

Charge each operation $3 (amortized cost c,). 

• If i is not an exact power of 2, pay $1, and store $2 as credit. 
• If i is an exact power of 2, pay $/, using stored credit. 

Operation Cost Actual cost Credit remaining 


1 3 1 

2 3 2 

3 3 1 

4 3 4 

5 3 1 

6 3 1 

7 3 1 

8 3 8 

9 3 1 

10 3 1 


2 

3 

5 

4 

6 
8 

10 

5 
7 
9 


n 

Since the amortized cost is $3 per operation, q = 3 n. 

i= 1 
n 

We know from Exercise 17.1-3 that q < 3/?. 

i=t 

n n 

Then we have E Q > E ci =>• credit = amortized cost — actual cost > 0. 

i=i ;=i 

Since the amortized cost of each operation is 0(1), and the amount of credit never 
goes negative, the total cost of n operations is Of/?). 
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Solution to Exercise 17.2-3 

We introduce a new field max[A ] to hold the index of the high-order 1 in A. Ini¬ 
tially, max[A ] is set to —1, since the low-order bit of A is at index 0, and there 
are initially no l’s in A. The value of max[A ] is updated as appropriate when the 
counter is incremented or reset, and we use this value to limit how much of A must 
be looked at to reset it. By controlling the cost of Reset in this way, we can limit 
it to an amount that can be covered by credit from earlier Increments. 

Increment (A) 
i ^-0 

while i < length [ A ] and A[i] = 1 
do A\i | -e- 0 
i -<—1 + 1 
if i < length[A ] 
then A[i] 1 

> Additions to book’s INCREMENT start here 
if i > mar [A] 
then max[A ] <— i 
else max[A] < - 1 

Reset(A) 
for i <— 0 to max[A] 
do A\i | 0 

max[A\ < - 1 

As for the counter in the book, we assume that it costs $1 to flip a bit. In addition, 
we assume it costs $1 to update max[A\. 

Setting and resetting of bits by INCREMENT will work exactly as for the original 
counter in the book: $1 will pay to set one bit to 1; $1 will be placed on the bit 
that is set to 1 as credit; the credit on each 1 bit will pay to reset the bit during 
incrementing. 

In addition, we’ll use $1 to pay to update max, and if max increases, we’ll place 
an additional $1 of credit on the new high-order 1. (If max doesn’t increase, we 
can just waste that $1— it won’t be needed.) Since Reset manipulates bits at 
positions only up to max[A], and since each bit up to there must have become the 
high-order 1 at some time before the high-order 1 got up to max[A], every bit seen 
by Reset has $1 of credit on it. So the zeroing of bits of A by Reset can be 
completely paid for by the credit stored on the bits. We just need $1 to pay for 
resetting max. 

Thus charging $4 for each INCREMENT and $1 for each Reset is sufficient, so the 
sequence of n INCREMENT and Reset operations takes O(n) time. 
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Solution to Exercise 17.3-3 


Let Dj be the heap after the ith operation, and let D, consist of n, elements. Also, 
let k be a constant such that each INSERT or Extract-Min operation takes at 
most k In n time, where n = max(n,_i, n,). (We don’t want to worry about taking 
the log of 0, and at least one of n,_i and n, is at least 1. We’ll see later why we use 
the natural log.) 

Define 


d>(A) = 


o 

krii In tii 


if m = 0 , 
if tii > 0 . 


This function exhibits the characteristics we like in a potential function: if we start 
with an empty heap, then 4>(D 0 ) = 0 , and we always maintain that d>(A) > 0 . 

Before proving that we achieve the desired amortized times, we show that if n > 2, 


then n In < 2. We have 

n— 1 — 


n In ■ 


= n In ( 1 + 


1 


= In 1 + 


< ln(e«-') 


(since 1 + x < e x for all real x) 


lne» 

n 


< 


n - 
2 , 


1 


assuming that n > 2. (The equation \ne< ! 1 = i s why we use the natural log.) 

If the ith operation is an INSERT, then n, = | + 1. If the ith operation inserts 

into an empty heap, then n, = 1 , n/-i = 0 , and the amortized cost is 

Cj = Cj + dH Dj ) — d>(D,_i) 

< k In 1+^-1 In 1 — 0 
= 0 . 


If the ith operation inserts into a nonempty heap, then /?, = /?,_| + 1, and the 
amortized cost is 

Cj = Cj + d>(D,) — d>(D,_i) 

< k Inn, + krij Inn, — £«.,•_j lnn,_i 

= k Inn, + krii Inn, — kin, — 1) In(n, — 1) 

= k In n,- + krii In n,- — kit. In(n, — 1) + k In(n, — 1) 
n,- 

< 2k In n, + kni In- 

n, - 1 

< 2k In n,- + 2k 
= OOgtii) . 


If the ith operation is an Extract-Min, then n, = n,_i — 1. If the ith operation 
extracts the one and only heap item, then n, = 0 , n,_i = 1 , and the amortized cost 
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Cj — Ci + <£>(£),) — 3>(D,_i) 

< & In 1+0 — k • 1 In 1 
= 0 . 


If the ith operation extracts from a heap with more than 1 item, then /;, = /?,_| — I 
and Hi -1 > 2 , and the amortized cost is 

Ci = Ci + <t>(Dj) — d>(D;_i) 

< k In rij- 1 + ktij In «,• — ktij-i In n,_i 
= kln»,_i +k{rii -1 — l)ln(n,_i — 1 ) — krij-i Inn,_| 

= klnrij-i + krii-i ln(n,_i — 1) — fcln(«,_i — 1) — krii-i lnw/.j 


, , »/-i . , . «/-i - 1 

k In--f- krii-\ In- 

Hi -1 - 1 «/-l 

n,_i 

< k In-- + krii- \ In 1 


= k In 


ni -1 - 1 
»<-l 
«/-i - 1 


< yt ln 2 (since «,_i > 2 ) 

= 0 ( 1 ). 


A slightly different potential function—which may be easier to work with—is as 
follows. For each node x in the heap, let d,(x) be the depth of x in D,. Define 

d>(A) = J2 k(d ’ (x)+]) 

x&D[ 

= k I m + ^ dj (x) 

V seDj 

where k is defined as before. 

Initially, the heap has no items, which means that the sum is over an empty set, and 
so d>(A)) = 0. We always have <t>(D ( ) > 0, as required. 

Observe that after an INSERT, the sum changes only by an amount equal to the 
depth of the new last node of the heap, which is Ug+J- Thus, the change 
in potential due to an INSERT is k( 1 + fig «,J ), and so the amortized cost is 
0(lg ni) + 0 (lg tii) = 0 (lg Hi) = 0(lg n). 

After an Extract-Min, the sum changes by the negative of the depth of the old 
last node in the heap, and so the potential decreases by k( 1 + Llg«,-i_l). The 
amortized cost is at most k lg /?,_ i —k( 1 + Ug «/-iJ) = 0 (1). 



Solution to Problem 17-2 

a. The Search operation can be performed by searching each of the individually 
sorted arrays. Since all the individual arrays are sorted, searching one of them 
using a binary search algorithm takes Oflg m ) time, where m is the size of the 
array. In an unsuccessful search, the time is 0 (lg m). In the worst case, we may 
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assume that all the arrays Ao, A\,..., Aj._i are full, k = [lg(« + 1)1, and we 
perform an unsuccessful search. The total time taken is 
T(n) = 0 (lg 2 k ~ l + lg 2 k ~ 2 + • • • + lg 2 1 + lg 2°) 

= 0((k — 1) + (k — 2) + • • • + 1 + 0) 

= &(k(k - l)/2) 

= 0(flg(/i + 1)1 (flg(n + 1)1 - l)/2) 

= © (lg 2 n) . 

Thus, the worst-case running time is ©(lg 2 n). 

b. We create a new sorted array of size 1 containing the new element to be inserted. 
If array A 0 (which has size 1) is empty, then we replace Ao with the new sorted 
array. Otherwise, we merge sort the two arrays into another sorted array of 
size 2. If Ai is empty, then we replace A\ with the new array; otherwise we 
merge sort the arrays as before and continue. Since array A is of size 2', if we 
merge sort two arrays of size 2 each, we obtain one of size 2' +1 , which is the 
size of Aj. |_i. Thus, this method will result in another list of arrays in the same 
structure that we had before. 

Let us analyze its worst-case running time. We will assume that merge sort 
takes 2m time to merge two sorted lists of size m each. If all the arrays 
Ao, A \,..., At _2 are full, then the running time to fill array A-i would be 
T{n ) = 2 (2° + 2 1 + • • • + 2 k ~ 2 ) 

= 2(2 k ~ 1 - 1) 

= 2 k — 2 
= ©(*). 

Therefore, the worst-case time to insert an element into this data structure is 
0(«). 

However, let us now analyze the amortized running time. Using the aggregate 
method, we compute the total cost of a sequence of n inserts, starting with the 
empty data structure. Let r be the position of the rightmost 0 in the binary 
representation (tik-i, n, t _ 2 , .. . ,n 0 ) of n, so that nj = 1 for j — 0 , 1 ,..., r — 1 . 
The cost of an insertion when n items have already been inserted is 

r -1 

Y^,2-2 j = 0(2') . 

7=0 

Furthermore, r = 0 half the time, r = la quarter of the time, and so on. 
There are at most \n/2 r ~\ insertions for each value of r. The total cost of the n 
operations is therefore bounded by 

/rig(»+Di \ 

°y Y, {\^]) 2 j = 0(nlgn) - 

The amortized cost per INSERT operation, therefore is 0(lg n). 

We can also use the accounting method to analyze the running time. We can 
charge $k to insert an element. $1 pays for the insertion, and we put $(k — 1) 
on the inserted item to pay for it being involved in merges later on. Each time 
it is merged, it moves to a higher-indexed array, i.e., from A to A i+i . It can 
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move to a higher-indexed array at most k — 1 times, and so the $(k — 1 ) on the 
item suffices to pay for all the times it will ever be involved in merges. Since 
k = 0 (lg«), we have an amortized cost of @(lg«) per insertion. 

c. Delete (x) will be implemented as follows: 

1. Find the smallest j for which the array Ay with 2 7 elements is full. Let y be 
the last element of Ay. 

2. Let v be in the array A, . If necessary, find which array this is by using the 
search procedure. 

3. Remove x from A, and put y into A,. Then move y to its correct place in A,-. 

4. Divide Aj (which now has 2 7 — 1 elements left): The first element goes into 
array A 0 , the next 2 elements go into array A\, the next 4 elements go into 
array A 2 , and so forth. Mark array Ay as empty. The new arrays are created 
already sorted. 

The cost of Delete is ©(«) in the worst case, where i — k — 1 and j = 
k — 2: 0(lgn) to find Ay, 0(lg 2 «) to find A,, 0(2') = 0(n) to put y in its 
correct place in array A,, and 0(2 7 ) = 0(n) to divide array Ay. The following 
sequence of n operations, where n /3 is a power of 2, yields an amortized cost 
that is no better: perform n/3 INSERT operations, followed by n/3 pairs of 
Delete and Insert. It costs 0(n Ig n) to do the first n/3 Insert operations. 
This creates a single full array. Each subsequent Delete/Insert pair costs 
0(«) for the Delete to divide the full array and another &(n) for the Insert 
to recombine it. The total is then 0(/j 2 ), or 0(«) per operation. 


Solution to Problem 17-4 

a. For RB-Insert, consider a complete red-black tree in which the colors alter¬ 
nate between levels. That is, the root is black, the children of the root are red, 
the grandchildren of the root are black, the great-grandchildren of the root are 
red, and so on. When a node is inserted as a red child of one of the red leaves, 
then case 1 of RB-Insert-Fixup occurs (lg(n + l))/2 times, so that there are 
Q (lg n) color changes to fix the colors of nodes on the path from the inserted 
node to the root. 

For RB-Delete, consider a complete red-black tree in which all nodes are 
black. If a leaf is deleted, then the double blackness will be pushed all the way 
up to the root, with a color change at each level (case 2 of RB-Delete-Fixup), 
for a total of £2 (lg n) color changes. 

b. All cases except for case 1 of RB-Insert-Fixup and case 2 of RB-Delete- 
Fixup are terminating. 

c. Case 1 of RB-Insert-Fixup reduces the number of red nodes by 1. As Fig¬ 
ure 13.5 shows, node z’s parent and uncle change from red to black, and z’s 
grandparent changes from black to red. Hence, O (T) = <t>( T) — 1. 

d. Lines 1-16 of RB-Insert cause one node insertion and a unit increase in po¬ 
tential. The nonterminating case of RB-Insert-Fixup (Case 1) makes three 
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color changes and decreases the potential by 1. The terminating cases of RB- 
Insert-Fixup (cases 2 and 3) cause one rotation each and do not affect the 
potential. (Although case 3 makes color changes, the potential does not change. 
As Figure 13.6 shows, node z’s parent changes from red to black, and z’s grand¬ 
parent changes from black to red.) 

e. The number of structural modifications and amount of potential change result¬ 
ing from lines 1-16 of RB-Insert and from the terminating cases of RB- 
Insert-Fixup are 0(1), and so the amortized number of structural modifica¬ 
tions of these parts is 0(1). The nonterminating case of RB-Insert-Fixup 
may repeat 0(lg n) times, but its amortized number of structural modifications 
is 0, since by our assumption the unit decrease in the potential pays for the 
structural modifications needed. Therefore, the amortized number of structural 
modifications performed by RB-Insert is 0(1). 

/. From Figure 13.5, we see that case 1 of RB-Insert-Fixup makes the follow¬ 
ing changes to the tree: 

• Changes a black node with two red children (node C) to a red node, resulting 
in a potential change of —2. 

• Changes a red node (node A in part (a) and node B in paid (b)) to a black 
node with one red child, resulting in no potential change. 

• Changes a red node (node D ) to a black node with no red children, resulting 
in a potential change of 1. 

The total change in potential is — 1, which pays for the structural modifications 
performed, and thus the amortized number of structural modifications in case 1 
(the nonterminating case) is 0. The terminating cases of RB-Insert-Fixup 
cause 0(1) structural changes. Because w(v) is based solely on node colors and 
the number of color changes caused by terminating cases is 0(1), the change 
in potential in terminating cases is 0(1). Hence, the amortized number of 
structural modifications in the terminating cases is 0(1). The overall amortized 
number of structural modifications in RB-Insert, therefore, is 0(1). 

g. Figure 13.7 shows that case 2 of RB -Delete-Fixup makes the following 
changes to the tree: 

• Changes a black node with no red children (node D ) to a red node, resulting 
in a potential change of — 1. 

• If B is red, then it loses a black child, with no effect on potential. 

• If B is black, then it goes from having no red children to having one red 
child, resulting in a potential change of —1. 

The total change in potential is either —1 or —2, depending on the color of B. 
In either case, one unit of potential pays for the structural modifications per¬ 
formed, and thus the amortized number of structural modifications in case 2 
(the nonterminating case) is at most 0. The terminating cases of RB -Delete 
cause 0(1) structural changes. Because w(v) is based solely on node col¬ 
ors and the number of color changes caused by terminating cases is 0(1), the 
change in potential in terminating cases is 0(1). Hence, the amortized number 
of structural changes in the terminating cases is 0(1). The overall amortized 
number of structural modifications in RB-Delete-Fixup, therefore, is 0(1). 
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h. Since the amortized number structural modification in each operation is 0(1), 
the actual number of structural modifications for any sequence of m RB- 
Insert and RB-Delete operations on an initially empty red-black tree is 
0(m) in the worst case. 



Lecture Notes for Chapter 21: 
Data Structures for Disjoint Sets 


Chapter 21 overview 

Disjoint-set data structures 

• Also known as “union find.” 

• Maintain collection S — {Si, ..., .Sj} of disjoint dynamic (changing over time) 
sets. 

• Each set is identified by a representative, which is some member of the set. 

Doesn’t matter which member is the representative, as long as if we ask for the 
representative twice without modifying the set, we get the same answer both 
times. 

[We do not include notes for the proof of running time of the disjoint-set forest 
implementation, which is covered in Section 21.4.] 


Operations 


• Make-Set(x): make a new set S, = {x}, and add 5, to S. 

• Union(x, y ): if x e S x , y € S y , then S <— S — S x — S y U {S v U S y }. 

• Representative of new set is any member of S x U S y , often the representative 
of one of .Sj and S y . 

• Destroys S x and .Sj, (since sets must be disjoint). 

• Find-Set(x): return representative of set containing x. 

Analysis in terms of: 

• n = # of elements = # of Make-Set operations, 

• m = total # of operations. 
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Analysis: 

• Since Make-Set counts toward total # of operations, m > n. 

• Can have at most n — 1 Union operations, since after n — 1 Unions, only 1 
set remains. 

• Assume that the first n operations are Make-Set (helpful for analysis, usually 
not really necessary). 

Application: dynamic connected components. 

For a graph G = (V , E ), vertices u, v are in same connected component if and 
only if there’s a path between them. 

• Connected components partition vertices into equivalence classes. 

Connected-Components (V, E) 
for each vertex v e V 
do Make-Set(u) 
for each edge ( u , v) e E 

do if Find-Set(m) / Find-Set(u) 
then Union (u, v ) 

Same-Component (u, v) 
if Find-Set(m) = Find-Set(u) 

then return TRUE 
else return FALSE 

Note: If actually implementing connected components, 

• each vertex needs a handle to its object in the disjoint-set data structure, 

• each object in the disjoint-set data structure needs a handle to its vertex. 


Linked list representation 

• Each set is a singly linked list. 

• Each list node has fields for 

• the set member 

• pointer to the representative 

• next 

• List has head (pointer to representative) and tail. 

Make-Set: create a singleton list. 

Find-Set: return pointer to representative. 

Union: a couple of ways to do it. 

1. Union(x, y): append v’s list onto end of y’s list. Use y’s tail pointer to find 
the end. 
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• Need to update the representative pointer for every node on x ’s list. 

• If appending a large list onto a small list, it can take a while. 

Operation # objects updated 

Union(jci, jc 2 ) 1 

Union(v 2 ,v 3 ) 2 

Union (x 3 ,x 4 ) 3 

Union(v 4 ,v 5 ) 4 

Union(v„_i , x n ) n — 1 

0(n 2 ) total 

Amortized time per operation = @(«). 

2. Weighted-union heuristic: Always append the smaller list to the larger list. 

A single union can still take Q(n) time, e.g., if both sets have n/2 members. 

Theorem 

With weighted union, a sequence of m operations on n elements takes 
0{m + n lg n) time. 

Sketch of proof Each Make-Set and Find-Set still takes 0(1). How many 
times can each object’s representative pointer be updated? It must be in the 
smaller set each time. 

times updated size of resulting set 

1 >2 

2 >4 

3 >8 

k > 2 k 

lg n > n 

Therefore, each representative is updated < lg n times. ■ (theorem) 

Seems pretty good, but we can do much better. 


Disjoint-set forest 

Forest of trees. 

• 1 tree per set. Root is representative. 

• Each node points only to its parent. 
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• Make-Set: make a single-node tree. 

• Union: make one root a child of the other. 

• Find-Set: follow pointers to the root. 

Not so good—could get a linear chain of nodes. 


Great heuristics 

• Union by rank: make the root of the smaller tree (fewer nodes) a child of the 
root of the larger tree. 

• Don’t actually use size. 

• Use rank, which is an upper bound on height of node. 

• Make the root with the smaller rank into a child of the root with the larger 
rank. 

• Path compression: Find path = nodes visited during Find-Set on the trip to 
the root. Make all nodes on the find path direct children of root. 



p[.r] <— x 
rank\x\ 0 


Union(jc, y) 

Link(Find-Set(v), Find-Set( y)) 





Lecture Notes for Chapter 21: Data Structures for Disjoint Sets 


21-5 


LinkO, y) 
if rank[x\ > rank[y] 

then p \ y | x 
else plx] y 

\> If equal ranks, choose y as parent and increment its rank, 
if rank[x ] = rank[y\ 
then rank[y] rankly] + 1 

Find-Set(A) 
if x ^ plx] 

then plx] Find-Set(ju[.x]) 
return plx] 

Find-Set makes a pass up to find the root, and a pass down as recursion unwinds 
to update each node on find path to point directly to root. 

Running time 

If use both union by rank and path compression, 0(m a(n)). 

n a(n ) 

0-2 0 ~ 

3 1 

4-7 2 

8-2047 3 

2048-A 4 (l) 4 

What’s A 4 (l)? See Section 21.4, if you dare. It’s 1CP ~ # of atoms in observ¬ 
able universe. 

This bound is tight—there is a sequence of operations that takes Q(m a(n )) time. 
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Solution to Exercise 21.2-3 

We want to show that we can assign 0(1) charges to Make-Set and Find-Set 
and an O(lgn) charge to UNION such that the charges for a sequence of these 
operations are enough to cover the cost of the sequence — 0(w +n lg n ), according 
to the theorem. When talking about the charge for each kind of operation, it is 
helpful to also be able to talk about the number of each kind of operation. 

Consider the usual sequence of m Make-Set, Union, and Find-Set operations, 
n of which are Make-Set operations, and let / < n be the number of Union 
operations. (Recall the discussion in Section 21.1 about there being at most n — 1 
Union operations.) Then there are n Make-Set operations, / Union operations, 
and m — n — l Find-Set operations. 

The theorem didn’t separately name the number / of Unions; rather, it bounded 
the number by n. If you go through the proof of the theorem with / Unions, you 
get the time bound O (m — / + Z lg /) = O (m +1 lg /) for the sequence of operations. 
That is, the actual time taken by the sequence of operations is at most c(m + / lg /), 
for some constant c. 

Thus, we want to assign operation charges such that 

(Make-Set charge) • n 
+(Find-Set charge) • (m — n — /) 

T(Union charge) • / 

> c(m + / lg/) , 

so that the amortized costs give an upper bound on the actual costs. 

The following assignments work, where d is some constant > c: 

• Make-Set: d 

• Find-Set: d 

• Union: c'(lg« + 1) 

Substituting into the above sum, we get 

dn + d(in — n — l) + c'flg n + 1)/ = dm + dl lg n 

= d(m + / lgn) 

> c(m + /lg/) . 
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Solution to Exercise 21.2-5 

Let’s call the two lists A and B, and suppose that the representative of the new list 
will be the representative of A. Rather than appending B to the end of A, instead 
splice B into A right after the first element of A. We have to traverse B to update 
representative pointers anyway, so we can just make the last element of B point to 
the second element of A. 


Solution to Exercise 21.3-3 

You need to find a sequence of m operations on n elements that takes Q(m lg n) 
time. Start with n Make-Sets to create singleton sets {jci}, {x 2 },..., {x„}. Next 
perform the n — 1 Union operations shown below to create a single set whose tree 
has depth lg n. 


Union(jci, jc 2 ) 
Union(x 3 , x 4 ) 
Union(x 5 , x 6 ) 

n/2 of these 

Union(A,,- 1 , x„) 


Union(x 2 , x 4 ) 
Union(x 6 , jc 8 ) 
Union(xio, JC 12 ) 

n/4 of these 

Union(x„_ 2 ,x„) 


Union(x 4 , x 8 ) 
Union(xi 2 , x i6 ) 
Union( x 20) t 24 ) 

n /8 of these 

Union(x„_ 4 , x n ) 



Union(x„/ 2 , x„) 

1 of these 


Finally, perform m — 2n + 1 Find-Set operations on the deepest element in the 
tree. Each of these Find-Set operations takes Q(\gn) time. Letting m > 3 n, we 
have more than m/3 Find-Set operations, so that the total cost is Q(m lg n). 


Solution to Exercise 21.3-4 

With the path-compression heuristic, the sequence of m Make-Set, Find-Set, 
and Link operations, where all the Link operations take place before any of the 
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Find-Set operations, runs in 0{m) time. The key observation is that once a 
node v appears on a find path, x will be either a root or a child of a root at all times 
thereafter. 

We use the accounting method to obtain the O(m) time bound. We charge a 
Make-Set operation two dollars. One dollar pays for the Make-Set, and one 
dollar remains on the node x that is created. The latter pays for the first time that re¬ 
appears on a find path and is turned into a child of a root. 

We charge one dollar for a Link operation. This dollar pays for the actual linking 
of one node to another. 

We charge one dollar for a Find-Set. This dollar pays for visiting the root and 
its child, and for the path compression of these two nodes, during the Find-Set. 
All other nodes on the find path use their stored dollar to pay for their visitation 
and path compression. As mentioned, after the Find-Set, all nodes on the find 
path become children of a root (except for the root itself), and so whenever they 
are visited during a subsequent Find-Set, the Find-Set operation itself will pay 
for them. 

Since we charge each operation either one or two dollars, a sequence of m opera¬ 
tions is charged at most 2m dollars, and so the total time is 0(m). 

Observe that nothing in the above argument requires union by rank. Therefore, we 
get an O(m) time bound regardless of whether we use union by rank. 


Solution to Exercise 21.4-4 

Clearly, each Make-Set and Link operation takes 0(1) time. Because the rank 
of a node is an upper bound on its height, each find path has length 0(lg n), which 
in turn implies that each Find-Set takes O(lgn) time. Thus, any sequence of 
m Make-Set, Link, and Find-Set operations on n elements takes 0(m\gn) 
time. It is easy to prove an analogue of Lemma 21.7 to show that if we convert a 
sequence of m' Make-Set, Union, and Find-Set operations into a sequence of 
m Make-Set, Link, and Find-Set operations that take 0{m lg n ) time, then the 
sequence of m' Make-Set, Union, and Find-Set operations takes 0(m lg n) 
time. 


Solution to Exercise 21.4-5 

Professor Dante is mistaken. Take the following scenario. Let n = 16, and make 
16 separate singleton sets using Make-Set. Then do 8 Union operations to link 
the sets into 8 pairs, where each pair has a root with rank 0 and a child with rank 1. 
Now do 4 Unions to link pairs of these trees, so that there are 4 trees, each with a 
root of rank 2, children of the root of ranks 1 and 0, and a node of rank 0 that is the 
child of the rank-1 node. Now link pairs of these trees together, so that there are 
two resulting trees, each with a root of rank 3 and each containing a path from a 
leaf to the root with ranks 0, 1, and 3. Finally, link these two trees together, so that 
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there is a path from a leaf to the root with ranks 0, 1,3, and 4. Let .r and y be the 
nodes on this path with ranks 1 and 3, respectively. Since \ (1) = 3, level (x) = 1, 
and since Ay (3) = 4, level(y) = 0. Yet y follows x on the find path. 


Solution to Exercise 21.4-6 

First, a'(2 2047 - 1) = min {k : A*(l) > 2047} = 3, and 2 2047 - 1 » 10 80 . 

Second, we need that 0 < level (x) < oe'(n) for all non roots v with rank[x] > 1. 
With this definition of a'(n), we have A a ,/(„)(rank[v]) > A a > {n] (\) > lg (n + 1) > 
lg n > rank(p\x\). The rest of the proof goes through with d(n) replacing a(n). 


Solution to Problem 21-1 


a. For the input sequence 

4, 8, E, 3, E, 9, 2, 6, E, E, E, 1, 7, E, 5 , 

the values in the extracted array would be 4, 3, 2, 6, 8, 1. 

The following table shows the situation after the ith iteration of the for loop 
when we use Off-Line-Minimum on the same input. (For this input, n = 9 
and m— the number of extractions—is 6). 


i 

Ki 

k 2 

K 3 

k 4 

^5 

k 6 

K 1 

1 

ex 

2 

trac 

3 

ted 

4 

5 
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0 

14,8} 

m 

mm 

1! 

11 

11,7} 

15} 







1 

14, 8} 

13} 

19,2,6} 

11 

1! 


15,1,7} 






1 

2 

14, 8} 

13} 


19,2,6} 

11 


15,1,7} 



2 



1 

3 

14,8} 



19,2,6,3} 

11 


15,1,7} 


3 

2 



1 

4 




19,2,6,3,4,8} 

11 


15,1,7} 

4 

3 

2 



1 

5 




19,2,6,3,4,8} 

1! 


15,1,7} 

4 

3 

2 



1 

6 





19,2,6,3,4.8} 


15,1,7} 

4 

3 

2 

6 


1 

7 





19,2,6,3,4,8} 


15,1,7} 

4 

3 

2 

6 


1 

8 







15, 1,7, 9,2, 6,3,4, 8} 

4 

3 

2 

6 

8 

1 


Because j — m + 1 in the iterations for i = 5 and i = 7, no changes occur in 
these iterations. 


b. We want to show that the array extracted returned by Off-Line-Minimum is 
correct, meaning that for i = 1,2,..., m, extracted[j ] is the key returned by 
the yth Extract-Min call. 

We start with n INSERT operations and m Extract-Min operations. The 
smallest of all the elements will be extracted in the first Extract-Min after 
its insertion. So we find j such that the minimum element is in Kj , and put the 
minimum element in extracted\ j\, which corresponds to the Extract-Min 
after the minimum element insertion. 

Now we reduce to a similar problem with n — 1 INSERT operations and m — 1 
Extract-Min operations in the following way: the Insert operations are 
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the same but without the insertion of the smallest that was extracted, and the 
Extract-Min operations are the same but without the extraction that ex¬ 
tracted the smallest element. 

Conceptually, we unite lj and I /+ |, removing the extraction between them and 
also removing the insertion of the minimum element from I ; UI /+ |. Uniting I 
and I ;+ i is accomplished by line 6. We need to determine which set is IQ, rather 
than just using Kj + 1 unconditionally, because Kj + 1 may have been destroyed 
when it was united into a higher-indexed set by a previous execution of line 6. 

Because we process extractions in increasing order of the minimum value 
found, the remaining iterations of the for loop correspond to solving the re¬ 
duced problem. 

There are two other points worth making. First, if the smallest remaining el¬ 
ement had been inserted after the last Extract-Min (i.e., j = m + 1), then 
no changes occur, because this element is not extracted. Second, there may be 
smaller elements within the Kj sets than the the one we are currently looking 
for. These elements do not affect the result, because they correspond to ele¬ 
ments that were already extracted, and their effect on the algorithm’s execution 
is over. 

c. To implement this algorithm, we place each element in a disjoint-set forest. 
Each root has a pointer to its K, set, and each K, set has a pointer to the root of 
the tree representing it. All the valid sets K, are in a linked list. 

Before Off-Line-minimum, there is initialization that builds the initial sets K, 
according to the I, sequences. 

• Line 2 (“determine j such that i e K”) turns into j Find-Set(/). 

• Line 5 (“let 1 be the smallest value greater than j for which set IQ exists”) 
turns into K\ next\Kj\. 

• Line 6 (“K/ <— K ; U K h destroying K”) turns into 1 <— LlNK(/\ 1) and 
remove K/ from the linked list. 

To analyze the running time, we note that there are n elements and that we have 
the following disjoint-set operations: 

• n Make-Set operations 

• at most n — 1 UNION operations before starting 

• n Find-Set operations 

• at most n Link operations 

Thus the number m of overall operations is O(n). The total running time is 
0(m a(n)) = 0{n a(n)). 

[The “tight bound” wording that this question uses does not refer to an “asymp¬ 
totically tight” bound. Instead, the question is merely asking for a bound that is 
not too “loose.”] 
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Solution to Problem 21-2 

a. Denote the number of nodes by n, and let n = {m + l)/3, so that m = 
3 n — 1. First, perform the n operations Make-Tree (m), Make-Tree (V 2 ), 

..., Make-Tree(u„). Then perform the sequence of n — 1 Graft operations 
Graft(ui, vt), Graft(u 2 , V 3 ),..., Graft^,,-!, v„); this sequence produces 
a single disjoint-set tree that is a linear chain of n nodes with y, at the root 
and vi as the only leaf. Then perform Find-Depth (wi) repeatedly, n times. 
The total number of operations is n + (n — \) + n — 3n — \ — m. 

Each Make-Tree and Graft operation takes 0(1) time. Each Find-Depth 
operation has to follow an n-node find path, and so each of the n Find-Depth 
operations takes @(n) time. The total time is n ■ Q(n) + (2 n — 1) • 0(1) = 
@(« 2 ) = 0 (in 2 ). 

b. Make-Tree is like Make-Set, except that it also sets the d value to 0: 

Make-Tree(u) 
p[v( v 
rank[v ] 0 

d[u] <- 0 

It is correct to set d\ v \ to 0, because the depth of the node in the single-node 
disjoint-set tree is 0, and the sum of the depths on the find path for 1 ; consists 
only of d[v]. 

c. Find-Depth will call a procedure Find-Root: 

Find-Root (v) 
if p[v] ^ p[p[v]] 
then y <- p\ v\ 

p[v] <r- FlND-RoOT(y) 
d[n] <- d[n] + d[y] 
return p[v\ 

Find-Depth (v) 

Find-Root (v) > No need to save the return value, 

if v = p[u] 

then return d[n] 

else return d[u] + d[p[v]] 


Find-Root performs path compression and updates pseudodistances along the 
find path from v. It is similar to Find-Set on page 508, but with three changes. 
First, when v is either the root or a child of a root (one of these conditions holds 
if and only if p[v] = p[p\vf\) in the disjoint-set forest, we don’t have to re¬ 
curse; instead, we just return p[v]. Second, when we do recurse, we save the 
pointer p[u] into a new variable y. Third, when we recurse, we update d [ v ] by 
adding into it the d values of all nodes on the find path that are no longer proper 
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ancestors of v after path compression; these nodes are precisely the proper an¬ 
cestors of v other than the root. Thus, as long as v does not start out the Find- 
Root call as either the root or a child of the root, we add d[y | into d\ u |. Note 
that d[y] has been updated prior to updating d [ u |, if y is also neither the root 
nor a child of the root. 

Find-Depth first calls Find-Root to perform path compression and update 
pseudodistances. Afterward, the find path from v consists of either just v (if v 
is a root) or just v and p[v] (if v is not a root, in which case it is a child of the 
root after path compression). In the former case, the depth of v is just d\y], and 
in the latter case, the depth is d[v] + d[p[vf\. 

d. Our procedure for Graft is a combination of Union and Link: 

Graft( r, v) 
r' *r- Find-Root (r) 
v' <r- Find-Root (v) 
z <r- Find-Depth (u) 
if rank[r '] > rank\v'\ 
then p[v'] <r- r' 

d[r'] d[r '] + z + 1 

d[i/] <— d\v’\ — d[r'] 
else p[r '] v' 

d[r'] <r- d[r '] + z + 1 — d[v'~\ 
if rank\r' | = rank[v'] 
then rank[v'] rank[v'\ + 1 

This procedure works as follows. First, we call Find-Root on r and v in 
order to find the roots r and v', respectively, of their trees in the disjoint-set 
forest. As we saw in part (c), these Find-Root calls also perform path com¬ 
pression and update pseudodistances on the find paths from r and v. We then 
call Find-Depth(u), saving the depth of v in the variable z. (Since we have 
just compressed u’s find path, this call of Find-Depth takes 0(1) time.) Next, 
we emulate the action of Link, by making the root (/ or v') of smaller rank a 
child of the root of larger rank; in case of a tie, we make r a child of v'. 

If v' has the smaller rank, then all nodes in r ’s tree will have their depths in¬ 
creased by the depth of v plus 1 (because r is to become a child of v). Altering 
the psuedodistance of the root of a disjoint-set tree changes the computed depth 
of all nodes in that tree, and so adding z + 1 to d[F] accomplishes this update 
for all nodes in r’s disjoint-set tree. Since i/ will become a child of r' in the 
disjoint-set forest, we have just increased the computed depth of all nodes in 
the disjoint-set tree rooted at d by d\r’\. These computed depths should not 
have changed, however. Thus, we subtract off d[F] from d\ t r ' |, so that the sum 
d[V] + d[r' ] after making v' a child of r equals d\ v' | before making vf a child 
of r'. 

On the other hand, if r' has the smaller rank, or if the ranks are equal, then r 
becomes a child of v' in the disjoint-set forest. In this case, if remains a root 
in the disjoint-set forest afterward, and we can leave d\ v \ alone. We have to 
update d[r'], however, so that after making r a child of v', the depth of each 
node in r’s disjoint-set tree is increased by z + L We add z + 1 to d\r\, but we 



Solutions for Chapter 21: Data Structures for Disjoint Sets 


21-13 


also subtract out d{ t/], since we have just made r a child of v'. Finally, if the 
ranks of r' and v' are equal, we increment the rank of t/, as is done in the Link 
procedure. 

e. The asymptotic running times of Make-Tree, Find-Depth, and Graft are 
equivalent to those of Make-Set, Find-Set, and Union, respectively. Thus, 
a sequence of m operations, n of which are Make-Tree operations, takes 
0(m a(n )) time in the worst case. 
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Graph representation 

Given graph G = (V, E). 

• May be either directed or undirected. 

• Two common ways to represent for algorithms: 

1. Adjacency lists. 

2. Adjacency matrix. 

When expressing the running time of an algorithm, it’s often in terms of both | V | 
and \E\. In asymptotic notation—and only in asymptotic notation—we’ll drop the 
cardinality. Example: ()(V + E). 

[The introduction to Part VI talks more about this.] 


Adjacency lists 


Array Adj of | V\ lists, one per vertex. 

Vertex iT s list has all vertices v such that (w, v) e E. (Works for both directed and 
undirected graphs.) 


Example: For an undirected graph: 

Adj 



1 

2 

3 

4 

5 



If edges have weights, can put the weights in the lists. 

Weight: w : E —»■ R 

We’ll use weights later on for spanning trees and shortest paths. 
Space: ®(V + E). 

Time: to list all vertices adjacent to u: 0(degree(«)). 

Time: to determine if (u,v) e E: O (degree(«)). 
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Example: For a directed 



graph: 

Adj 


1 

2 

3 

4 



Same asymptotic space and time. 


Adjacency matrix 


| V | x | V | matrix A = (a !; -) 

1 if (iJ)eE, 

0 otherwise . 



1 

2 

3 

4 

5 






1 

0 

1 

0 

0 

1 


1 

2 

3 

4 

2 

1 

0 

1 

1 

1 

1 

0 

1 

0 

0 

3 

0 

1 

0 

1 

0 

2 

0 

0 

0 

1 

4 

0 

1 

1 

0 

1 

3 

1 

1 

0 

0 

5 

1 

1 

0 

1 

0 

4 

0 

0 

1 

1 



Space: ©( V 2 ). 

Time: to list all vertices adjacent to u: 0(F). 

Time: to determine if (; u , v) e E: 0(1). 

Can store weights instead of bits for weighted graph. 
We’ll use both representations in these lecture notes. 


Breadth-first search 

Input: Graph G = (V. E), either directed or undirected, and source vertex s e V. 

Output: d[u] = distance (smallest # of edges) from s to u, for all v e V. 

In book, also 7 r[u] = u such that (u, v) is last edge on shortest path 5 v. 

• u is ids predecessor. 

• set of edges {(n [w], v) : v / 5 } forms a tree. 

Later, we’ll see a generalization of breadth-first search, with edge weights. For 
now, we’ll keep it simple. 

• Compute only d[u], not tt[v\. [See book for tt\v\.[ 

• Omitting colors of vertices. [Used in book to reason about the algorithm. We'll 
skip them here.] 
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Idea: Send a wave out from s. 

• First hits all vertices 1 edge from s. 

• From there, hits all vertices 2 edges from s. 

• Etc. 

Use FIFO queue Q to maintain wavefront. 

• v e Q if and only if wave has hit v but has not come out of v yet. 

BFS(U, E,s) 
for each u e V — { 5 } 
do d[u] <r- 00 
r/[s] <r- 0 
< 2^-0 

Enqueue^, s) 
while Q ^ 0 

do u <r- Dequeue (Q) 
for each v e Adj\ u | 
do if d[u] = 00 

then d[n] d[u] + 1 
Enqueue(0, v) 

Example: directed graph [undirected example in book] . 



Can show that Q consists of vertices with d values. 
i i i ... i i + 1 i + 1 ... i + 1 

• Only 1 or 2 values. 

• If 2, differ by 1 and all smallest are first. 

Since each vertex gets a finite d value at most once, values assigned to vertices are 
monotonically increasing over time. 

Actual proof of correctness is a bit trickier. See book. 

BFS may not reach all vertices. 

Time = 0(V + E). 

• 0(V ) because every vertex enqueued at most once. 

• 0(E) because every vertex dequeued at most once and we examine (w, v) only 
when u is dequeued. Therefore, every edge examined at most once if directed, 
at most twice if undirected. 
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Depth-first search 

Input: G = ( V , E ), directed or undirected. No source vertex given! 

Output: 2 timestamps on each vertex: 

• cl\ v ] = discovery time 

• /1 v | = finishing time 

These will be useful for other algorithms later on. 

Can also compute tt\v\. [See book.] 

Will methodically explore every edge. 

• Start over from different vertices as necessary. 

As soon as we discover a vertex, explore from it. 

• Unlike BFS, which puts a vertex on a queue so that we explore from it later. 
As DFS progresses, every vertex has a color: 

• WHITE = undiscovered 

• GRAY = discovered, but not finished (not done exploring from it) 

• BLACK = finished (have found everything reachable from it) 

Discovery and finish times: 

• Unique integers from 1 to 21 V \. 

• For all u,d[i;] < /[u]. 

In other words, 1 < d[n] < f[v] < 21V |. 

Pseudocode: Uses a global timestamp time. 

DFS(V, E) 

for each u e V 

do color[u] <r- WHITE 
time <— 0 
for each u e V 

do if color[u] = WHITE 
then DFS-Visit(m) 

DFS-Visit (m) 

color[u] <— GRAY > discover u 
time ■*— time +1 
d[u] time 

for each v e Adj\u\ > explore (a, v) 
do if color[v] = WHITE 
then DFS-Visit(u) 
color[u ] ■*— BLACK 
time <— time +1 
f[u] <— time 


> finish u 
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Example: [Go through this example, adding in the d and f values as they’re com¬ 
puted. Show colors as they change. Don’t put in the edge types yet.] 



Time = 0(V + E). 

• Similar to BFS analysis. 

• 0, not just 0, since guaranteed to examine every vertex and edge. 

DFS forms a depth-first forest comprised of > 1 depth-first trees. Each tree is 
made of edges (u, v) such that u is gray and v is white when (u, v ) is explored. 

Theorem (Parenthesis theorem) 

[Proof omitted.] 

For all u, v, exactly one of the following holds: 

1. d \ u | < flu] < d[v] < f[v ] or d[r>] < f[v] < d[u ] < /[ it ] and neither of u 
and v is a descendant of the other. 

2. d\u | < d[v) < /[u] < f[u] and v is a descendant of u. 

3. d[ u] < d[u] < /[«] < f[v] and u is a descendant of v. 

So d\u] < d[v) < /[«] < f[v) cannot happen. 

Like parentheses: 

• OK: ()[] ([]) [()] 

• Not OK: ([)] [(]) 

Corollary 

v is a proper descendant of u if and only if d\u \ < d[u] < f[v] < f[u ]. 

Theorem (White-path theorem) 

[Proof omitted.] 

v is a descendant of u if and only if at time d\u\, there is a path u ^ v consisting 
of only white vertices. (Except for u, which was just colored gray.) 
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Classification of edges 

• Tree edge: in the depth-first forest. Found by exploring (u, v). 

• Back edge: (u, v ), where a is a descendant of v. 

• Forward edge: (u, v), where v is a descendant of u, but not a tree edge. 

• Cross edge: any other edge. Can go between vertices in same depth-first tree 
or in different depth-first trees. 

[Now label the example from above with edge types.] 

In an undirected graph, there may be some ambiguity since (; u , v) and (v. u) are 
the same edge. Classify by the first type above that matches. 

Theorem 
[Proof omitted.] 

In DFS of an undirected graph, we get only tree and back edges. No forward or 
cross edges. 


Topological sort 

Directed acyclic graph (dag) 

A directed graph with no cycles. 

Good for modeling processes and structures that have a partial order: 

• a > h and b > c =P a > c. 

• But may have a and b such that neither a > b nor b > c. 

Can always make a total order (either a > b or b > a for all a ^ b) from a partial 
order. In fact, that’s what a topological sort will do. 

Example: dag of dependencies for putting on goalie equipment: [Leave on board, 
but show without discovery and finish times. Will put them in later.] 
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Lemma 

A directed graph G is acyclic if and only if a DFS of G yields no back edges. 
Proof => : Show that back edge =>• cycle. 

Suppose there is a back edge ( u , v). Then v is ancestor of u in depth-first forest. 



Therefore, there is a path v u, so v u —► v is a cycle. 

<= : Show that cycle =>■ back edge. 

Suppose G contains cycle c. Let v be the first vertex discovered in c, and let (u, v) 
be the preceding edge in c. At time d[v], vertices of c form a white path v u 
(since v is the first vertex discovered in c). By white-path theorem, u is descendant 
of v in depth-first forest. Therefore, (u, v) is a back edge. ■ (lemma) 

Topological sort of a dag: a linear ordering of vertices such that if (it, v) e E, 
then u appeal's somewhere before v. (Not like sorting numbers.) 

Topological-Sort(V, E) 

call DFS(F, E) to compute finishing times f[v] for all v € V 
output vertices in order of decreasing finish times 

Don’t need to sort by finish times. 

• Can just output vertices as they’re finished and understand that we want the 
reverse of this list. 

• Or put them onto the front of a linked list as they’re finished. When done, the 
list contains vertices in topologically sorted order. 

Time: <d(V + E). 

Do example. [Now write discovery and finish times in goalie equipment example.] 
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Order: 

26 

socks 

24 

shorts 

23 

hose 

22 

pants 

21 

skates 

20 

leg pads 

14 

t-shirt 

13 

chest pad 

12 

sweater 

11 

mask 

6 

batting glove 

5 

catch glove 

4 

blocker 


Correctness: Just need to show if (; u , v) G E, then / [ v | < flu]. 
When we explore (u, v), what arc the colors of u and v? 

• u is gray. 

• Is v gray, too? 

• No, because then v would be ancestor of u. 

=>• (u, v) is aback edge. 

=> contradiction of previous lemma (dag has no back edges). 

• Is v white? 

• Then becomes descendant of u. 

By parenthesis theorem, d[u ] < d[n] <f[v ] < flu], 

• Is v black? 

• Then v is already finished. 

Since we’re exploring (u, v), we have not yet finished u. 
Therefore, f\v\ < f[u]. 


Strongly connected components 

Given directed graph G = (V, E). 

A strongly connected component ( SCC ) of G is a maximal set of vertices CCf 
such that for all u, v € C, both u ^ v and v ^ u. 

Example: [Just show SCC’s at first. Do DFS a little later.] 
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Algorithm uses G T = transpose of G. 

• g t = (V, E T ), E t = {(u, v ) : (v, u) e E}. 

• G t is G with all edges reversed. 

Can create G T in @(V + £) time if using adjacency lists. 

Observation: G and G T have the same SCC’s. (u and v are reachable from each 
other in G if and only if reachable from each other in G T .) 


Component graph 

. o scc = (V scc E scc ) 

• V scc has one vertex for each SCC in G. 

• E scc has an edge if there’s an edge between the corresponding SCC’s in G. 
For our example: 



Lemma 

G scc is a dag. More formally, let C and C' be distinct SCC’s in G, let u, v e C, 
u', v' G C’, and suppose there is a path u ti in G. Then there cannot also be a 
path v' ^ v in G. 

Proof Suppose there is a path 1 / n in G. Then there arc paths u u ^ v' and 
v' ^ v n in G. Therefore, u and v' are reachable from each other, so they are 
not in separate SCC’s. ■ (lemma) 

SCC(G) 

call DFS(G) to compute finishing times f[u ] for all u 
compute G t 

call DFS(G t ), but in the main loop, consider vertices in order of decreasing f[u ] 
(as computed in first DFS) 

output the vertices in each tree of the depth-first forest formed in second DFS 
as a separate SCC 

Example: 


1. Do DFS 

2. G t 

3. DFS (roots blackened) 



Time: @(V + E). 

How can this possibly work? 
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Idea: By considering vertices in second DFS in decreasing order of finishing times 
from first DFS, we are visiting vertices of the component graph in topological sort 
order. 

To prove that it works, first deal with 2 notational issues: 

• Will be discussing d[u] and /[«]. These always refer to first DFS. 

• Extend notation for d and / to sets of vertices U C V: 

• d(U) = min ueU {d\u |} (earliest discovery time) 

• f(U ) = max„ €[ / {/[«]} (latest finishing time) 

Lemma 

Let C and C' be distinct SCC’s in G = ( V , E). Suppose there is an edge ( u , v) e E 
such that «eC and v e C'. 



Then /(C) > /(C'). 

Proof Two cases, depending on which SCC had the first discovered vertex during 
the first DFS. 

• If d(C) < d(C 0, let v be the first vertex discovered in C. At time d\x\, all 

vertices in C and C' are white. Thus, there exist paths of white vertices from x 

to all vertices in C and C'. 

By the white-path theorem, all vertices in C and C' are descendants of x in 
depth-first tree. 

By the parenthesis theorem, f[x ] = /(C) > f(C r ). 

• If d(C) > d(C 0, let y be the first vertex discovered in C'. At time d[y], all 

vertices in C' are white and there is a white path from y to each vertex in C => 

all vertices in C' become descendants of y. Again, f[y] = f(C'). 

At time d[y], all vertices in C are white. 

By earlier lemma, since there is an edge (u, v ), we cannot have a path from C 
to C. 

So no vertex in C is reachable from y. 

Therefore, at time /[y], all vertices in C are still white. 

Therefore, for all w e C, f[w] > /[y], which implies that /(C) > f(C). 

■ (lemma) 


Corollary 

Let C and C' be distinct SCC’s in G = (V, E). Suppose there is an edge 
(; u , v) € E T , where u e C and v e C'. Then /(C) < /(CO- 

Proof (u,v) € E T =y ( v,u ) e E. Since SCC’s of G and G T ar - e the same, 
/(CO > /(C). ■ (corollary) 
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Corollary 

Let C and C' be distinct SCC’s in G = {V, E), and suppose that /(C) > f(C r ). 
Then there cannot be an edge from C to C in G T . 

Proof It’s the contrapositive of the previous corollary. ■ 

Now we have the intuition to understand why the SCC procedure works. 

When we do the second DFS, on G T , start with SCC C such that /(C) is maximum. 
The second DFS starts from some x e C, and it visits all vertices in C. Corollary 
says that since /(C) > f(C') for all C' / C, there are no edges from C to C' 
in C T . 

Therefore, DFS will visit only vertices in C. 

Which means that the depth-first tree rooted at x contains exactly the vertices of C. 

The next root chosen in the second DFS is in SCC C such that / (C') is maximum 
over all SCC’s other than C. DFS visits all vertices in C', but the only edges out 
of C' go to C, which we’ve already visited. 

Therefore, the only tree edges will be to vertices in C. 

We can continue the process. 

Each time we choose a root for the second DFS, it can reach only 

• vertices in its SCC—get tree edges to these, 

• vertices in SCC’s already visited in second DFS—get no tree edges to these. 
We are visiting vertices of (G T ) SCC in reverse of topologically sorted order. 

[The book has a formal proof.] 
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Solution to Exercise 22.1-6 

We start by observing that if a,j = 1, so that (7. /) e E, then vertex i cannot be 
a universal sink, for it has an outgoing edge. Thus, if row i contains a 1, then 
vertex i cannot be a universal sink. This observation also means that if there is a 
self-loop (/, /), then vertex i is not a universal sink. Now suppose that qj = 0, so 
that (i, j) qL E, and also that i ^ j. Then vertex j cannot be a universal sink, for 
either its in-degree must be strictly less than | V\ — 1 or it has a self-loop. Thus 
if column j contains a 0 in any position other than the diagonal entry (j, j), then 
vertex j cannot be a universal sink. 

Using the above observations, the following procedure returns TRUE if vertex k 
is a universal sink, and FALSE otherwise. It takes as input a|V| x | V\ adjacency 
matrix A = (a (J ). 

Is-Sink(A, k) 
let Abe |V| x |V| 

for j <— 1 to | V | D> Check for a 1 in row k 

do if akj = 1 

then return FALSE 

for / I to | V | D> Check for an off-diagonal 0 in column k 

do if cijk = 0 and i ^ k 

then return FALSE 
return TRUE 

Because this procedure runs in O(V) time, we may call it only 0(1) times in 
order to achieve our OtVj-timc bound for determining whether directed graph G 
contains a universal sink. 

Observe also that a directed graph can have at most one universal sink. This prop¬ 
erty holds because if vertex j is a universal sink, then we would have (;, /) e E 
for all i 7^ j and so no other vertex i could be a universal sink. 

The following procedure takes an adjacency matrix A as input and returns either a 
message that there is no universal sink or a message containing the identity of the 
universal sink. It works by eliminating all but one vertex as a potential universal 
sink and then checking the remaining candidate vertex by a single call to Is-Sink. 
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Universal-Sink (A) 
let Abe |V| x |V| 

i j +- 1 

while i < | V | and j < \ V \ 
do if ciij = 1 

then i <— i + 1 
else j <r- j + 1 

5 <-0 

if / > |V| 

then return “there is no universal sink” 
elseif Is-Sink(A,/) = false 
then return “there is no universal sink” 
else return / “is a universal sink” 

Universal-Sink walks through the adjacency matrix, starting at the upper left 
corner and always moving either right or down by one position, depending on 
whether the current entry a, j it is examining is 0 or 1. It stops once either / or j 
exceeds | Vj. 

To understand why Universal-Sink works, we need to show that after the while 
loop terminates, the only vertex that might be a universal sink is vertex /. The call 
to Is-SlNK then determines whether vertex / is indeed a universal sink. 

Let us fix / and j to be values of these valuables at the termination of the while 
loop. We claim that every vertex k such that 1 < k < i cannot be a universal sink. 
That is because the way that / achieved its final value at loop termination was by 
finding a 1 in each row k for which 1 < k < i. As we observed above, any vertex k 
whose row contains a 1 cannot be a universal sink. 

If/ > | V | at loop termination, then we have eliminated all vertices from consid¬ 
eration, and so there is no universal sink. If, on the other hand, / < | V | at loop 
termination, we need to show that every vertex k such that / < k < | V | cannot 
be a universal sink. If / < | V | at loop termination, then the while loop terminated 
because j > \ V \. That means that we found a 0 in every column. Recall our earlier 
observation that if column k contains a 0 in an off-diagonal position, then vertex k 
cannot be a universal sink. Since we found a 0 in every column, we found a 0 in 
every column k such that / < k < \ V \. Moreover, we never examined any matrix 
entries in rows greater than /, and so we never examined the diagonal entry in any 
column k such that / < k < | V \. Therefore, all the Os that we found in columns k 
such that / < k < \V\ were off-diagonal. We conclude that every vertex k such 
that / < k < | V\ cannot be a universal sink. 

Thus, we have shown that every vertex less than / and every vertex greater than / 
cannot be a universal sink. The only remaining possibility is that vertex / might be 
a universal sink, and the call to Is-SlNK checks whether it is. 

To see that Universal-Sink runs in O(V) time, observe that either / or j is 
incremented in each iteration of the while loop. Thus, the while loop makes at 
most 2\V\ — 1 iterations. Each iteration takes 0(1) time, for a total while loop 
time of 0(V ) and, combined with the 0(Y)-time call to Is-SlNK, we get a total 
running time of 0(V). 
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Solution to Exercise 22.1-7 


BB T (i, j) = = Y2 h iebje 

eeE eeE 

• If i = j , then bi e bj e — 1 (it is 1 • 1 or (—1) • (—1)) whenever e enters or leaves 
vertex i, and 0 otherwise. 

• If i f=- j , then b Ir b je = —1 when e = (/, j ) or e = ( j, i ), and 0 otherwise. 
Thus, 

t• . _ | degree of i = in-degree + out-degree if i — j , 

l ’ J {—(# of edges connecting i and j ) if i ^ j . 


Solution to Exercise 22.2-4 

The correctness proof for the BFS algorithm shows that d\u \ = 8(s,u), and the 
algorithm doesn’t assume that the adjacency lists are in any particular order. 

In Figure 22.3, if t precedes .r in Adj[w ], we can get the breadth-first tree shown 
in the figure. But if v precedes t in Adj[w ] and u precedes y in Adj\x ], we can get 
edge (x, u) in the breadth-first tree. 


Solution to Exercise 22.2-5 


The edges in E n are shaded in the following graph: 



To see that E„ cannot be a breadth-first tree, let’s suppose that Adj\s] contains u 
before v. BFS adds edges (s, u) and (s, v ) to the breadth-first tree. Since u is 
enqueued before v, BFS then adds edges (u. w) and (u,x). (The order of w and v 
in Adj[u\ doesn’t matter.) Symmetrically, if Adj\s \ contains v before u, then BFS 
adds edges ( s , v ) and (, s , u) to the breadth-first tree, v is enqueued before u, and 
BFS adds edges (v, w ) and (y, x). (Again, the order of w and v in Adj\ v ] doesn’t 
matter.) BFS will never put both edges ( u , w) and (v, x) into the breadth-first tree. 
In fact, it will also never put both edges (u, x) and (y, w) into the breadth-first tree. 


Solution to Exercise 22.2-6 

Create a graph G where each vertex represents a wrestler and each edge represents 
a rivalry. The graph will contain n vertices and r edges. 
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Perform as many BFS’s as needed to visit all vertices. Assign all wrestlers whose 
distance is even to be good guys and all wrestlers whose distance is odd to be bad 
guys. Then check each edge to verify that it goes between a good guy and a bad 
guy. This solution would take 0(n + r ) time for the BFS, 0(n) time to designate 
each wrestler as a good guy or bad guy, and 0(r ) time to check edges, which is 
0(n + r) time overall. 


Solution to Exercise 22.3-4 

a. Edge (u, v) is a tree edge or forward edge if and only if u is a descendant of u 
in the depth-first forest. (If (u, v) is a back edge, then u is a descendant of v, 
and if ( u , u) is a cross edge, then neither of u or v is a descendant of the other.) 
By Corollary 22.8, therefore, (u, v ) is a tree edge or forward edge if and only if 
d[u] < d[u] < /[u] < /[«]. 

b. First, suppose that (u, v) is a back edge. A self-loop is by definition a back 
edge. If (w, v) is a self-loop, then clearly d[u] = d[u] < f[u ] = f[v]. If (u, v) 
is not a self-loop, then u is a descendant of v in the depth-first forest, and by 
Corollary 22.8, d[u] < d[u] < f[u ] < /[u]. 

Now, suppose that d[n] < d[u] < /[«] < f[v]. If u and v are the same 
vertex, then d[u] = d[u] < f[u ] = f[v], and (; u , v) is a self-loop and hence 
a back edge. If u and v are distinct, then d[u] < d[u] < /[«] < f[vl By 
Theorem 22.7, interval [d\u], f[u ]] is contained entirely within the interval 
[d[n], /[d]], and u is a descendant of i; in a depth-first tree. Thus, ( u , v) is a 
back edge. 

c. First, suppose that (u, v) is a cross edge. Since neither u nor v is an ancestor 
of the other, Theorem 22.7 says that the intervals [d[u], f\u]\ and \d\ r f ,/ [ v \ | 
are entirely disjoint. Thus, we must have either d\u \ < /[«] < d[v ] < f[v] 
or d[n] < f\v ] < d[u] < /[«]. We claim that we cannot have d[u] < d[v] if 
(; u , v) is a cross edge. Why? If d[u] < d\ v], then v is white at time d[u]. By 
Theorem 22.9, v is a descendant of u, which contradicts (n, v) being a cross 
edge. Thus, we must have d[v] < f[v ] < d[u] < f[u ]. 

Now suppose that d[v] < f[v] < d[u] < f\u]. By Theorem 22.7, neither u 
nor v is a descendant of the other, which means that (u , v) must be a cross edge. 


Solution to Exercise 22.3-7 


Let us consider the example graph and depth-first search below. 



d 

/ 

w 

1 

6 

u 

2 

3 

V 

4 

5 
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Clearly, there is a path from u to v in G. The bold edges are in the depth-first 
forest produced. We can see that d[u] < d\ t>] in the depth-first search but v is not 
a descendant of u in the forest. 


Solution to Exercise 22.3-8 


Let us consider the example graph and depth-first search below. 
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Clearly, there is a path from u to v in G. The bold edges of G are in the depth-first 
forest produced by the search. However, d\ t>| > f[u] and the conjecture is false. 


Solution to Exercise 22.3-10 


Let us consider the example graph and depth-first search below. 
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Clearly u has both incoming and outgoing edges in G but a depth-first search of G 
produced a depth-first forest where u is in a tree by itself. 


Solution to Exercise 22.3-11 

Compare the following pseudocode to the pseudocode of DFS on page 541 of the 
book. Changes were made in order to assign the desired cc label to vertices. 

DFS(G) 

for each vertex u € V\G\ 
do color[u] <r- WHITE 
7 r[u] <- NIL 
time <— 0 
counter <— 0 
for each vertex u e F[G] 
do if color[u ] = WHITE 

then counter <— counter +1 
DFS-ViSIT(m, counter) 
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DFS-Visit(m, counter) 
color[u ] <- GRAY 

cc[u) <— counter \> Label the vertex. 

time <— time +1 

d[u) <— time 

for each v € Adj[u\ 

do if color[v ] = WHITE 
then n\v] u 

DFS-VlSIT(i;, counter) 
color[u] <— BLACK 
f[u] time time +1 

This DFS increments a counter each time DFS-Visit is called to grow a new tree 
in the DFS forest. Every vertex visited (and added to the tree) by DFS-Visit is 
labeled with that same counter value. Thus cc[u\ = cc| v | if and only if u and v are 
visited in the same call to DFS-Visit from DFS, and the final value of the counter 
is the number of calls that were made to DFS-Visit by DFS. Also, since every 
vertex is visited eventually, every vertex is labeled. 

Thus all we need to show is that the vertices visited by each call to DFS-Visit 
from DFS are exactly the vertices in one connected component of G. 

• All vertices in a connected component are visited by one call to DFS-Visit 
from DFS: 

Let u be the first vertex in component C visited by DFS-Visit. Since a vertex 
becomes non-white only when it is visited, all vertices in C are white when 
DFS-Visit is called for u. Thus, by the white-path theorem, all vertices in C 
become descendants of u in the forest, which means that all vertices in C are 
visited (by recursive calls to DFS-Visit) before DFS-Visit returns to DFS. 

• All vertices visited by one call to DFS-Visit from DFS are in the same con¬ 
nected component: 

If two vertices are visited in the same call to DFS-Visit from DFS, they in 
the same connected component, because vertices are visited only by following 
paths in G (by following edges found in adjacency lists, starting from some 
vertex). 


Solution to Exercise 22.4-3 

An undirected graph is acyclic (i.e., a forest) if and only if a DFS yields no back 
edges. 

• If there’s a back edge, there’s a cycle. 

• If there’s no back edge, then by Theorem 22.10, there are only tree edges. 
Hence, the graph is acyclic. 

Thus, we can run DFS: if we find a back edge, there’s a cycle. 

• Time: O(V). (Not 0(V + E)\) 

If we ever see | Vj distinct edges, we must have seen a back edge because (by 
Theorem B.2 on p. 1085) in an acyclic (undirected) forest, |Ej < IV| - 1 . 
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Solution to Exercise 22.4-5 

Topological-Sort (G) 

[> Initialize in-degree, @(V) time 
for each vertex u e V 
do in-degree[u] 0 

> Compute in-degree, ©(V + E) time 
for each vertex u e V 

do for each v e Adj\ u ] 

do in-degree\v\ in-degree[v ] + 1 
[> Initialize Queue, 0(V) time 
<2^0 

for each vertex u e V 
do if in-degree[u] = 0 

then Enqueue! Q, u) 

\> while loop takes ()(V + E) time 
while 0/0 

do u <- Dequeue(0) 
output u 

D> for loop executes 0(E) times total 
for each v e Adj\u | 

do in-degree | v | /»-degree[u] — 1 
if in-degree [v] = 0 
then Enqueue ((2, v) 
t> Check for cycles, O(V) time 
for each vertex u G V 

do if in-degree \ u \ 0 

then report that there’s a cycle 

> Another way to check for cycles would be to count the vertices 

> that are output and report a cycle if that number is < in 

To find and output vertices of in-degree 0, we first compute all vertices’ in-degrees 
by making a pass through all the edges (by scanning the adjacency lists of all the 
vertices) and incrementing the in-degree of each vertex an edge enters. 

• This takes 0(E + E) time (| V| adjacency lists accessed, |£j edges total found 
in those lists, 0 ( 1 ) work for each edge). 

We keep the vertices with in-degree 0 in a FIFO queue, so that they can be en¬ 
queued and dequeued in 0(1) time. (The order in which vertices in the queue are 
processed doesn’t matter, so any kind of queue works.) 

• Initializing the queue takes one pass over the vertices doing 0(1) work, for 
total time 0 (E). 

As we process each vertex from the queue, we effectively remove its outgoing 
edges from the graph by decrementing the in-degree of each vertex one of those 
edges enters, and we enqueue any vertex whose in-degree goes to 0. There’s 
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no need to actually remove the edges from the adjacency list, because that ad¬ 
jacency list will never be processed again by the algorithm: Each vertex is en¬ 
queued/dequeued at most once because it is enqueued only if it starts out with 
in-degree 0 or if its in-degree becomes 0 after being decremented (and never incre¬ 
mented) some number of times. 

• The processing of a vertex from the queue happens O(V) times because no 
vertex can be enqueued more than once. The per-vertex work (dequeue and 
output) takes 0(1) time, for a total of 0(V) time. 

• Because the adjacency list of each vertex is scanned only when the vertex is 
dequeued, the adjacency list of each vertex is scanned at most once. Since the 
sum of the lengths of all the adjacency lists is &(E), at most 0(E) time is spent 
in total scanning adjacency lists. For each edge in an adjacency list, 0(1) work 
is done, for a total of 0(E) time. 

Thus the total time taken by the algorithm is 0(V + E). 

The algorithm outputs vertices in the right order (u before v for every edge (u, v)) 
because v will not be output until its in-degree becomes 0, which happens only 
when every edge (u, v) leading into v has been “removed” due to the processing 
(including output) of u. 

If there are no cycles, all vertices are output. 

• Proof: Assume that some vertex vq is not output. u 0 cannot start out with in¬ 
degree 0 (or it would be output), so there are edges into u>. Since t>o’s in-degree 
never becomes 0, at least one edge (i>i, i>o) is never removed, which means that 
at least one other vertex vi was not output. Similarly, Vi not output means that 
some vertex m such that (i>2, fi) e E was not output, and so on. Since the 
number of vertices is finite, this path (• • • —»■ vj —► tq —»■ i>o) is finite, so we 
must have v, = Vj for some i and j in this sequence, which means there is a 
cycle. 

If there are cycles, not all vertices will be output, because some in-degrees never 
become 0. 

• Proof: Assume that a vertex in a cycle is output (its in-degree becomes 0). Let v 
be the first vertex in its cycle to be output, and let u be u’s predecessor in the 
cycle. In order for u’s in-degree to become 0, the edge (u, v) must have been 
“removed,” which happens only when u is processed. But this cannot have 
happened, because v is the first vertex in its cycle to be processed. Thus no 
vertices in cycles are output. 


Solution to Exercise 22.5-5 

We have at our disposal an 0(V + Lf-time algorithm that computes strongly con¬ 
nected components. Let us assume that the output of this algorithm is a mapping 
scc\ii\, giving the number of the strongly connected component containing ver¬ 
tex u, for each vertex u. Without loss of generality, assume that scc\u | is an integer 
in the set {1, 2,..., | Vj}. 
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Construct the multiset (a set that can contain the same object more than once) 
T = {scc\u | : u e V}, and sort it by using counting sort. Since the values we are 
sorting are integers in the range 1 to | V\, the time to sort is O(V). Go through the 
sorted multiset T and every time we find an element x that is distinct from the one 
before it, add x to V stc . (Consider the first element of the sorted set as “distinct 
from the one before it.”) It takes O(V) time to construct F scc . 

Construct the set of ordered pairs 

S = {(x, y) : there is an edge (; u , v) e E, x — scc[u\, and y = scc[u]} . 

We can easily construct this set in @(E) time by going through all edges in E and 
looking up scc[u\ and scc\v ] for each edge (it, v) e E. 

Having constructed S, remove all elements of the form (x, x). Alternatively, when 
we construct S, do not put an element in S when we find an edge (u, v ) for which 
scc[u ] = 5cc[u]. S now has at most \E\ elements. 

Now sort the elements of S using radix sort. Sort on one component at a time. The 
order does not matter. In other words, we are performing two passes of counting 
sort. The time to do so is 0(V + E), since the values we are sorting on are integers 
in the range 1 to | V \. 

Finally, go through the sorted set S, and every time we find an element (x, y) 
that is distinct from the element before it (again considering the first element of 
the sorted set as distinct from the one before it), add (x, y) to £ scc . Sorting and 
then adding (x, y) only if it is distinct from the element before it ensures that we 
add (x, y) at most once. It takes 0(E) time to go through S in this way, once S 
has been sorted. 

The total time is 0(V + E). 


Solution to Exercise 22.5-6 

The basic idea is to replace the edges within each SCC by one simple, directed 

cycle and then remove redundant edges between SCC’s. Since there must be at 

least k edges within an SCC that has k vertices, a single directed cycle of k edges 

gives the ^-vertex SCC with the fewest possible edges. 

The algorithm works as follows: 

1. Identify all SCC’s of G. Time: 0(V + E), using the SCC algorithm in Sec¬ 
tion 22.5. 

2. Form the component graph G scc . Time: 0(V + E), by Exercise 22.5-5. 

3. Start with E' = 0. Time: 0(1). 

4. For each SCC of G, let the vertices in the SCC be m, V 2 , ■ ■ ■, tty, and add to E' 
the directed edges (m, m), (V 2 , vj), ..., (u*_i, v*)> ( v k, c)- These edges form 
a simple, directed cycle that includes all vertices of the SCC. Time for all 
SCC’s: 0(V). 

5. For each edge ( u , u) in the component graph G scc , select any vertex x in u ’s 
SCC and any vertex y in v’s SCC, and add the directed edge (x, y) to E. 
Time: 0(E). 

Thus, the total time is @(V + E). 
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Solution to Exercise 22.5-7 

To determine if G = ( V , E ) is semiconnected, do the following: 

1. Call Strongly-Connected-Components. 

2. Form the component graph. (By Exercise 22.5-5, you may assume that this 
takes 0(V + E) time.) 

3. Topologically sort the component graph. (Recall that it’s a dag.) Assuming that 
there are k SCC’s, the topological sort gives a linear ordering ( 14 , Vi , ..., Vg) 
of the vertices. 

4. Verify that the sequence of vertices (vi, V 2 , ..., Vg) given by topological sort 
forms a linear chain in the component graph. That is, verify that the edges 
(iq, V 2 ), ( V 2 , V 3 ), ..., (vg-i, vg) exist in the component graph. If the vertices 
form a linear chain, then the original graph is semiconnected; otherwise it is 
not. 

Because we know that all vertices in each SCC are mutually reachable from each 
other, it suffices to show that the component graph is semiconnected if and only if 
it contains a linear chain. We must also show that if there’s a linear chain in the 
component graph, it’s the one returned by topological sort. 

We’ll first show that if there’s a linear chain in the component graph, then it’s the 
one returned by topological sort. In fact, this is trivial. A topological sort has to 
respect every edge in the graph. So if there’s a linear chain, a topological sort must 
give us the vertices in order. 

Now we’ll show that the component graph is semiconnected if and only if it con¬ 
tains a lineal - chain. 

First, suppose that the component graph contains a linear chain. Then for every 
pair of vertices u,v in the component graph, there is a path between them. If u 
precedes v in the linear chain, then there’s a path u v. Otherwise, v precedes u, 
and there’s a path v u. 

Conversely, suppose that the component graph does not contain a linear chain. 
Then in the list returned by topological sort, there are two consecutive vertices ?/ 
and iy+i, but the edge (u,, u,+i) is not in the component graph. Any edges out of q 
are to vertices Vj, where j > i + 1 , and so there is no path from v, to v ,-+1 in the 
component graph. And since q + i follows v, in the topological sort, there cannot be 
any paths at all from q +t to v,. Thus, the component graph is not semiconnected. 
Running time of each step: 

1. @(V + £). 

2. 0(V + E). 

3. Since the component graph has at most |V| vertices and at most \E\ edges, 
0(V + E). 

4. Also 0(V + E). We just check the adjacency list of each vertex q in the 
component graph to verify that there’s an edge (q, u !+ i). We’ll go through 
each adjacency list once. 

Thus, the total running time is 0(V + E). 
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Solution to Problem 22-1 

a. 1. Suppose (u, v ) is a back edge or a forward edge in a BFS of an undirected 

graph. Then one of u and v, say u, is a proper ancestor of the other (u) in 
the breadth-first tree. Since we explore all edges of u before exploring any 
edges of any of u' s descendants, we must explore the edge (u, v) at the time 
we explore u. But then (u, v) must be a tree edge. 

2. In BFS, an edge (; u , v) is a tree edge when we set 7 r[t>] <— u. But we only do 
so when we set d[u] d[u] + 1. Since neither d[u] nor d[v] ever changes 
thereafter, we have d[u] = d\u ] + 1 when BFS completes. 

3. Consider a cross edge (u, v ) where, without loss of generality, u is visited 
before v. At the time we visit u, vertex v must already be on the queue, for 
otherwise (u, v ) would be a tree edge. Because v is on the queue, we have 
d[u] < d[u] + 1 by Lemma 22.3. By Corollary 22.4, we have d[u] > d[u]. 
Thus, either d[v] = d[u] or d[v] = d\u] + 1. 

b. 1. Suppose ( u , v) is a forward edge. Then we would have explored it while 

visiting u, and it would have been a tree edge. 

2. Same as for undirected graphs. 

3. For any edge ( u , v), whether or not it’s a cross edge, we cannot have 
d[u] > d[u] + 1 , since we visit v at the latest when we explore edge (u, v). 
Thus, d[u] < d[u] + 1. 

4. Clearly, d\ v \ > 0 for all vertices v. For a back edge (u , v), v is an ancestor 
of u in the breadth-first tree, which means that d\ v \ < d\u\. (Note that since 
self-loops are considered to be back edges, we could have u = v.) 


Solution to Problem 22-3 

a. An Euler tour is a single cycle that traverses each edge of G exactly once, but 
it might not be a simple cycle. An Euler tour can be decomposed into a set of 
edge-disjoint simple cycles, however. 

If G has an Euler tour, therefore, we can look at the simple cycles that, together, 
form the tour. In each simple cycle, each vertex in the cycle has one entering 
edge and one leaving edge. In each simple cycle, therefore, each vertex v has 
in-degree (v) = out-degree (v), where the degrees are either 1 (if v is on the 
simple cycle) or 0 (if v is not on the simple cycle). Adding the in- and out- 
degrees over all edges proves that if G has an Euler tour, then in-degree (v) = 
out-degree (v) for all vertices v. 

We prove the converse—that if in-degree(u) = out-degree(u) for all vertices v, 
then G has an Euler tour—in two different ways. One proof is nonconstructive, 
and the other proof will help us design the algorithm for paid (b). 

First, we claim that if in-degree(i>) = out-degree(u) for all vertices v, then we 
can pick any vertex u for which in-degree (u) = out-degree (u) > 1 and create 
a cycle (not necessarily simple) that contains u. To prove this claim, let us staid 
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by placing vertex u on the cycle, and choose any leaving edge of u, say (u, v). 
Now we put v on the cycle. Since in-degree (v) = out-degree (v) > 1, we can 
pick some leaving edge of v and continue visiting edges and vertices. Each time 
we pick an edge, we can remove it from further consideration. At each vertex 
other than u, at the time we visit an entering edge, there must be an unvisited 
leaving edge, since in-degree(u) = out-degree (v) for all vertices v. The only 
vertex for which there might not be an unvisited leaving edge is u, since we 
started the cycle by visiting one of u’s leaving edges. Since there’s always a 
leaving edge we can visit from all vertices other than u, eventually the cycle 
must return to u , thus proving the claim. 

The nonconstructive proof proves the contrapositive—that if G does not have 
an Euler tour, then in-degree (v) f=- out-degree (v) for some vertex w—by con¬ 
tradiction. Choose a graph G = (V, E) that does not have an Euler tour but 
has at least one edge and for which in-degree (v) = out-degree (v) for all ver¬ 
tices v, and let G have the fewest edges of any such graph. By the above claim, 
G contains a cycle. Let C be a cycle of G with the greatest number of edges, 
and let V c be the set of vertices visited by cycle C. By our assumption, C is 
not an Euler tour, and so the set of edges E' = E — C is nonempty. If we use 
the set V of vertices and the set FJ of edges, we get the graph G 1 = ( V , E')\ 
this graph has in-degree (v) = out-degree (u) for all vertices v, since we have 
removed one entering edge and one leaving edge for each vertex on cycle C. 
Consider any component G" = (V", E") of G 1 , and observe that G" also has 
in-degree (u) = out-degree(u) for all vertices v. Since E" C E' C E, it follows 
from how we chose G that G" must have an Euler tour, say C. Because the 
original graph G is connected, there must be some vertex x e V" U V c and, 
without loss of generality, consider x to be the first and last vertex on both C 
and C'. But then the cycle C" formed by first traversing C and then travers¬ 
ing C' is a cycle of G with more edges than C, contradicting our choice of C. 
We conclude that C must have been an Euler tour. 

The constructive proof uses the same ideas. Let us start at a vertex it and, via 
random traversal of edges, create a cycle. We know that once we take any edge 
entering a vertex v u, we can find an edge leaving v that we have not yet 
taken. Eventually, we get back to vertex u, and if there are still edges leaving it 
that we have not taken, we can continue the cycle. Eventually, we get back to 
vertex u and there are no untaken edges leaving u. If we have visited every 
edge in the graph G, we are done. Otherwise, since G is connected, there must 
be some unvisited edge leaving a vertex, say v, on the cycle. We can traverse 
a new cycle starting at v, visiting only previously unvisited edges, and we can 
splice this cycle into the cycle we already know. That is, if the original cycle 
is (u, ..., v, w, ..., u), and the new cycle is (v, x, ..., v), then we can create 
the cycle (u,... ,v,x,... ,v,w,... ,u). We continue this process of finding a 
vertex with an unvisited leaving edge on a visited cycle, visiting a cycle starting 
and ending at this vertex, and splicing in the newly visited cycle, until we have 
visited every edge. 

b. The algorithm is based on the idea in the constructive proof above. 

We assume that G is represented by adjacency lists, and we work with a copy 
of the adjacency lists, so that as we visit each edge, we can remove it from 
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its adjacency list. The singly linked form of adjacency list will suffice. The 
output of this algorithm is a doubly linked list T of vertices which, read in list 
order, will give an Euler tour. The algorithm constructs T by finding cycles 
(also represented by doubly linked lists) and splicing them into T. By using 
doubly linked lists for cycles and the Euler tour, splicing a cycle into the Euler 
tour takes constant time. 

We also maintain a singly linked list L in which each list element consists of 
two parts: 

1 . a vertex v, and 

2. a pointer to some appearance of v in T. 

Initially, L contains one vertex, which may be any vertex of G. 

Here is the algorithm: 

Euler-Tour(G) 

T <— empty list 
L (any vertex v, NIL) 

while L is not empty 

do remove (v, location-in-T ) from L 
C <r- VISIT(U) 
if location-in-T = NIL 
then T <r-C 

else splice C into T just before location-in-T 

return T 

Visit (u) 

C <— empty sequence of vertices 
u v 

while out-degree (u) > 0 

do let w be the first vertex in Adj\ u \ 

remove w from Adj\u\, decrementing out-degree (w) 
add u onto the end of C 
if out-degree (w) > 0 
then add (u, iT s location in C) to L 
u w 
return C 

The use of NIL in the initial assignment to L ensures that the first cycle C 
returned by Visit becomes the current version of the Euler tour T. All cycles 
returned by VISIT thereafter are spliced into T. We assume that whenever an 
empty cycle is returned by VISIT, splicing it into T leaves T unchanged. 

Each time Euler-Tour removes a vertex v from the list L, it calls Visit(v) 
to find a cycle C, possibly empty and possibly not simple, that starts and ends 
at v; the cycle C is represented by a list that starts with v and ends with the last 
vertex on the cycle before the cycle ends at v. Euler-Tour then splices this 
cycle C into the Euler tour T just before some appearance of v in T . 

When Visit is at a vertex u, it looks for some vertex w such that the edge (; u, w) 
has not yet been visited. Removing w from Adj\u\ ensures that we will never 
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visit (w, w ) again. VISIT adds u onto the cycle C that it constructs. If, after 
removing edge (u, w), vertex u still has any leaving edges, then u, along with 
its location in C, is added to L. The cycle construction continues from w, and 
it ceases once a vertex with no unvisited leaving edges is found. Using the 
argument from paid (a), at that point, this vertex must close up a cycle. At that 
point, therefore, the cycle C is returned. 

It is possible that a vertex u has unvisited leaving edges at the time it is added to 
list L in Visit, but that by the time that u is removed from L in Euler-Tour, 
all of its leaving edges have been visited. In this case, the while loop of VISIT 
executes 0 iterations, and VISIT returns an empty cycle. 

Once the list L is empty, every edge has been visited. The resulting cycle T is 
then an Euler tour. 

To see that Euler-Tour takes 0(E) time, observe that because we remove 
each edge from its adjacency list as it is visited, no edge is visited more than 
once. Since each edge is visited at some time, the number of times that a vertex 
is added to L, and thus removed from L, is at most \E\. Thus, the while loop in 
Euler-Tour executes at most E iterations. The while loop in Visit executes 
one iteration per edge in the graph, and so it executes at most E iterations as 
well. Since adding vertex u to the doubly linked list C takes constant time and 
splicing C into T takes constant time, the entire algorithm takes 0(E) time. 

Here is a variation on Euler-Tour, which may be a bit simpler to reason 
about. It maintains a pointer it to a vertex on the Euler tour, with the invariant 
that all vertices on the Euler tour behind u have already had all entering and 
leaving edges added to the tour. This variation calls the same procedure VISIT 
as above. 

Euler-Tour'(G) 
v <— any vertex 
T Visit(v) 

mark v’s position as the starting vertex in T 
u ■*— next\v\ 

while w’s position in T v’s position in T 
do C Visit (w) 

splice C into T, just before w’s position 
[> If C was empty, T has not changed. 

> If C was nonempty, then it began with w 
u ■*— next\next\prev\u\\ \ 

> If C was empty, w now points to the next vertex on T 

\> If C was nonempty, w now points to the next vertex on C 
(which has been spliced into T) 

return T 

Whenever we return from calling VlSlT(w), we know that out-degree(w) = 0, 
which means that we have visited all edges entering or leaving vertex w. Since 
Visit adds each edge it visits to the cycle C, which is then added to the Euler 
tour T, when we return from a call to Visit(w), all edges entering or leaving 
vertex w have been added to the tour. When we advance the pointer w in the 
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while loop, we need to ensure that it is advanced according to the current tour T, 
which may have just had a cycle C spliced into it. That’s why we advance u by 
the expression next[next[prev[u]]\, rather than just simply next [ w]. 

Since the graph G is connected, every edge will eventually be visited and added 
to the tour T. As before, each edge is visited exactly once, so that at completion, 
T will consist of exactly \E\ edges. Once a vertex u has had VISIT called on 
it, any future call of VISIT(u) will take 0(1) time, and so the total time for all 
calls to Visit is 0(E). 


Solution to Problem 22-4 

Compute G t in the usual way, so that G T is G with its edges reversed. Then do 
a depth-first search on G T , but in the main loop of DFS, consider the vertices in 
order of increasing values of L(v). If vertex u is in the depth-first tree with root v, 
then min(u) = v. Clearly, this algorithm takes 0(V + E) time. 

To show correctness, first note that if u is in the depth-first tree rooted at v in G 1 , 
then there is a path v u in G T , and so there is a path u v in G. Thus, the 
minimum vertex label of all vertices reachable from u is at most L(v), or in other 
words, L(v) > min {L(w) : in e R(u)}. 

Now suppose that L(v) > min {L(w) : in e /?(«)}, so that there is a vertex 
in € R(u ) such that L(in) < L(v). At the time rf[u] that we started the depth- 
first search from v, we would have already discovered in, so that d\ in \ < d\ ij. 
By the parenthesis theorem, either the intervals [d[u], /[u]], and \d \?/; |, /[in]] are 
disjoint and neither v nor in is a descendant of the other, or we have the ordering 
d[in] < d[u] < f[v ] < f[w ] and v is a descendant of w. The latter case cannot 
occur, since v is a root in the depth-first forest (which means that v cannot be a 
descendant of any other vertex). In the former case, since d[w] < d [ u |, we must 
have d[w] < /[w] < d[n] < f[v]. In this case, since u is reachable from in 
in G t , we would have discovered u by the time /[in], so that d[u] < /[in]. Since 
we discovered u during a search that started at v, we have d\v\ < d\u\. Thus, 
d[n] < d[u] < /[in] < d[n], which is a contradiction. We conclude that no such 
vertex in can exist. 
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Chapter 23 overview 

Problem 

• A town has a set of houses and a set of roads. 

• A road connects 2 and only 2 houses. 

• A road connecting houses u and v has a repair cost w(u. v). 

• Goal: Repair enough (and no more) roads such that 

1 . everyone stays connected: can reach every house from all other houses, and 

2 . total repair cost is minimum. 

Model as a graph: 

• Undirected graph G = (V, E ). 

• Weight w(u, v) on each edge (u,v) e E. 

• Find T C E such that 

1. T connects all vertices (T is a spanning tree), and 

2. w(T) = E w(u, v) is minimized. 

(u,v)eT 

A spanning tree whose weight is minimum over all spanning trees is called a min¬ 
imum spanning tree, or MST. 

Example of such a graph [edges in MST are shaded] : 



In this example, there is more than one MST. Replace edge {e, f) by (c, e). Get a 
different spanning tree with the same weight. 
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Growing a minimum spanning tree 

Some properties of an MST: 

• It has | V | — 1 edges. 

• It has no cycles. 

• It might not be unique. 


Building up the solution 

• We will build a set A of edges. 

• Initially, A has no edges. 

• As we add edges to A, maintain a loop invariant: 

Loop invariant: A is a subset of some MST. 

• Add only edges that maintain the invariant. If A is a subset of some MST, an 
edge (u, v) is safe for A if and only if A U {(u, w)} is also a subset of some 
MST. So we will add only safe edges. 


Generic MST algorithm 

Generic-MST(G, w) 

A <r- 0 

while A is not a spanning tree 

do find an edge (it , v) that is safe for A 
A^-AU {( u , n)} 

return A 

Use the loop invariant to show that this generic algorithm works. 

Initialization: The empty set trivially satisfies the loop invariant. 

Maintenance: Since we add only safe edges, A remains a subset of some MST. 
Termination: All edges added to A are in an MST, so when we stop, A is a span¬ 
ning tree that is also an MST. 


Finding a safe edge 

How do we find safe edges? 

Let’s look at the example. Edge (c, f ) has the lowest weight of any edge in the 
graph. Is it safe for A = 0? 

Intuitively: Let S C V be any set of vertices that includes c but not / (so that 
/ is in V — S). In any MST, there has to be one edge (at least) that connects S 
with V — S. Why not choose the edge with minimum weight? (Which would be 
(c, /) in this case.) 

Some definitions: Let S C V and AC£. 
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• A cut (S, V — S) is a partition of vertices into disjoint sets V and S — V. 

• Edge ( u , v) e E crosses cut (5, V — S) if one endpoint is in S and the other is 
in V - S. 

• A cut respects A if and only if no edge in A crosses the cut. 

• An edge is a light edge crossing a cut if and only if its weight is minimum over 
all edges crossing the cut. For a given cut, there can be > 1 light edge crossing 
it. 

Theorem 

Let A be a subset of some MST, ( S , V — S) be a cut that respects A, and (u, v) be 
a light edge crossing (S, V — S ). Then (u, u) is safe for A. 

Proof Let T be an MST that includes A. 

If T contains (u, v), done. 

So now assume that T does not contain (;u, v). We’ll construct a different MST T 
that includes AU{(h,v)}. 

Recall: a tree has unique path between each pair of vertices. Since T is an MST, it 
contains a unique path p between u and v. Path p must cross the cut (S, V — S) 
at least once. Let (x, y) be an edge of p that crosses the cut. From how we 
chose (w, v), must have w(u, v) < w(x, y). 



[Except for the dashed edge (u, v), all edges shown are in T. A is some subset of 
the edges ofT, but A cannot contain any edges that cross the cut ( S , V — S ), since 
this cut respects A. Shaded edges are the path p.j 

Since the cut respects A, edge (x, y) is not in A. 

To form V from T : 

• Remove (x, y). Breaks T into two components. 

• Add (u, v). Reconnects. 
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So T' = T — {(x, >’)} U {( u , t>)}. 

T' is a spanning tree. 

w(T') = w(T) — w(x, y) + w(u, v) 

< w(T ) , 

since w(u, v) < w(x, y). Since T' is a spanning tree, w(T') < w(T), and T is an 
MST, then T’ must be an MST. 

Need to show that (u, v) is safe for A: 

• ACT and (x, y) ^ A => A C T'. 

• AU!(!u))cr. 

• Since T is an MST, (, u , v) is safe for A. ■ (theorem) 

So, in Generic-MST: 

• A is a forest containing connected components. Initially, each component is a 
single vertex. 

• Any safe edge merges two of these components into one. Each component is a 
tree. 

• Since an MST has exactly | V\ — 1 edges, the for loop iterates | V | — 1 times. 
Equivalently, after adding | V\ — 1 safe edges, we’re down to just one component. 

Corollary 

If C = (Vc, E c ) is a connected component in the forest G A = (V , A) and (; u , t>) 
is a light edge connecting C to some other component in Ga (i.e., (u, v) is a light 
edge crossing the cut (Vc, V — Vc)), then (; u , v) is safe for A. 

Proof Set S = Vc in the theorem. ■ (corollary) 

This naturally leads to the algorithm called RruskaTs algorithm to solve the 
minimum-spanning-tree problem. 


Kruskal’s algorithm 

G = (V, E) is a connected, undirected, weighted graph, w : E —> R. 

• Starts with each vertex being its own component. 

• Repeatedly merges two components into one by choosing the light edge that 
connects them (i.e., the light edge crossing the cut between them). 

• Scans the set of edges in monotonically increasing order by weight. 

• Uses a disjoint-set data structure to determine whether an edge connects ver¬ 
tices in different components. 
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Kruskal(U, E , w) 

A «- 0 

for each vertex v e V 
do Make-Set(u) 

sort E into nondecreasing order by weight w 
for each ( u , v) taken from the sorted list 
do if Find-Set(k) ^ Find-Set(d) 
then A <— A U {(w, u)} 

Union (w, v) 

return A 

Run through the above example to see how Kruskal’s algorithm works on it: 

(c, /) : safe 

(g,i) ■ safe 

(e, f) : safe 
(c,e) : reject 

(d,h) : safe 

(f,h) : safe 
(e, d) : reject 
(b, d) : safe 
(d, g) : safe 
(b, c) : reject 
(g,h) : reject 

(a,b) : safe 

At this point, we have only one component, so all other edges will be rejected. [We 
could add a test to the main loop of Kruskal to stop once | V | — 1 edges have 
been added to A.] 

Get the shaded edges shown in the figure. 

Suppose we had examined (c, e) before {e, /). Then would have found (c, e) safe 
and would have rejected (e, /). 

Analysis 

Initialize A: 0(1) 

First for loop: | V\ Make-Sets 

Sort E : O(ElgE) 

Second for loop: 0(E) Find-Sets and Unions 

• Assuming the implementation of disjoint-set data structure, already seen in 
Chapter 21, that uses union by rank and path compression: 

0((V + E)a(V)) + 0(E\gE) . 

• Since G is connected, |£j > | V| — 1 =>■ 0(E a(V )) + 0(E lg E). 

• a(|V|) = 0(lgV) = 0(lg£). 

• Therefore, total time is 0(E lg E). 

• \E\ < |V| 2 => lg \E\ = 0(2lg V) = 0(lg V). 
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• Therefore, 0(E lg V) time. (If edges are already sorted, 0(E ot(V)), which is 
almost linear.) 


Prim’s algorithm 

• Builds one tree, so A is always a tree. 

• Starts from an arbitrary “root” r. 

• At each step, find a light edge crossing cut (Va, V — Va), where Va = vertices 
that A is incident on. Add this edge to A. 



[Edges of A are shaded.] 

How to find the light edge quickly? 

Use apriority queue Q: 

• Each object is a vertex mV — Va. 

• Key of v is minimum weight of any edge (u, v), where u e Va- 

• Then the vertex returned by Extract-Min is v such that there exists u e Va 
and (; u , v) is light edge crossing (Va, V — Va). 

• Key of v is oo if v is not adjacent to any vertices in Va. 

The edges of A will form a rooted tree with root r : 

• r is given as an input to the algorithm, but it can be any vertex. 

• Each vertex knows its parent in the tree by the attribute Jt[v] = parent of v. 
n[v ] = NIL if v — r or v has no parent. 

• As algorithm progresses, A = {(n, i r[u]) : v e V — {r} — Q}. 

• At termination, Va = V =>• Q = 0, so MST is A = {(n, : v e V — {r}}. 
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PRIM(K, E, w , r) 

Q^0 

for each u e V 

do key[u ] oo 

jt[u] NIL 
Insert (Q , u) 

Decrease-Key (Q, r, 0) > key[r\ 0 

while Q ■£ 0 

do u <r- Extract-Min(<2) 

for each v e Adj[u ] 

do if v € Q and w(u, v ) < key[v] 
then 7r[u] <r- u 

Decrease-Key(0, v, w(u, v )) 

Do example from previous graph. [Let a student pick the root.] 


Analysis 

Depends on how the priority queue is implemented: 

• Suppose Q is a binary heap. 

Initialize Q and first for loop: 0(V lg V) 

Decrease key of r: 0(\g V) 

while loop: |V| Extract-Min calls =>• 0(V lg V) 

<\E\ Decrease-Key calls 0(E lg V) 
Total: 0(E lg V) 

• Suppose we could do Decrease-Key in 0(1) amortized time. 

Then < \E\ Decrease-Key calls take 0(E) time altogether => total time 
becomes 0(V lg V + E). 

In fact, there is a way to do Decrease-Key in 0(1) amortized time: Fi¬ 
bonacci heaps, in Chapter 20. 



Solutions for Chapter 23: 
Minimum Spanning Trees 


Solution to Exercise 23.1-1 

Theorem 23.1 shows this. 

Let A be the empty set and S be any set containing u but not v. 


Solution to Exercise 23.1-4 

A triangle whose edge weights are all equal is a graph in which every edge is a light 
edge crossing some cut. But the triangle is cyclic, so it is not a minimum spanning 
tree. 


Solution to Exercise 23.1-6 

Suppose that for every cut of G, there is a unique light edge crossing the cut. Let 
us consider two minimum spanning trees, T and T, of G. We will show that every 
edge of T is also in T, which means that T and T are the same tree and hence 
there is a unique minimum spanning tree. 

Consider any edge (u, v) e T. If we remove (u, v) from T, then T becomes 
disconnected, resulting in a cut (S, V — S). The edge (u , v) is a light edge crossing 
the cut (S, V — S) (by Exercise 23.1-3). Now consider the edge (x, y ) e T that 
crosses (5, V — S ). It, too, is a light edge crossing this cut. Since the light edge 
crossing (S, V — S) is unique, the edges (u, v ) and (x, y ) are the same edge. Thus, 
(m, v) e T. Since we chose (u, v) arbitrarily, every edge in T is also in T. 

Here’s a counterexample for the converse: 
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Here, the graph is its own minimum spanning tree, and so the minimum spanning 
tree is unique. Consider the cut ({^}, {y, z}). Both of the edges (x, y) and (x, z) 
are light edges crossing the cut, and they are both light edges. 


Solution to Exercise 23.1-10 

Let w(T) = J2( X y )eT w ( x ’ y)- We have w'(T) = w(T) — k. Consider any other 
spanning tree V, so that w(T) < w(T'). 

If (x, y) £ r, then w'(T') = w(T') > w(T ) > w'(T). 

If (x, y) e V, then w'(T') = w(T')-k> w(T) — k = w'(T). 

Either way, w'(T) < w'(T'), and so T is a minimum spanning tree for weight 
function w'. 


Solution to Exercise 23.2-4 

We know that Kruskal’s algorithm takes O(V) time for initialization, 0(E lg E) 
time to sort the edges, and 0(E a(V)) time for the disjoint-set operations, for a 
total running time of 0(V + E lg E + E a(V)) = 0(E lg E). 

If we knew that all of the edge weights in the graph were integers in the range 
from 1 to |V|, then we could sort the edges in ()(V + E) time using counting 
sort. Since the graph is connected, V = 0(E), and so the sorting time is re¬ 
duced to 0(E). This would yield a total running time of 0(V + E + E a(V)) = 
0(E a(V)), again since V = 0(E), and since E = 0(E oi(V)). The time to 
process the edges, not the time to sort them, is now the dominant term. Knowl¬ 
edge about the weights won’t help speed up any other paid of the algorithm, since 
nothing besides the sort uses the weight values. 

If the edge weights were integers in the range from 1 to W for some constant W, 
then we could again use counting sort to sort the edges more quickly. This time, 
sorting would take 0(E + W) = 0(E) time, since W is a constant. As in the first 
paid, we get a total running time of 0(E a( V)). 


Solution to Exercise 23.2-5 

The time taken by Prim’s algorithm is determined by the speed of the queue oper¬ 
ations. With the queue implemented as a Fibonacci heap, it takes 0(E + V lg V) 
time. 

Since the keys in the priority queue are edge weights, it might be possible to im¬ 
plement the queue even more efficiently when there are restrictions on the possible 
edge weights. 

We can improve the running time of Prim’s algorithm if W is a constant by imple¬ 
menting the queue as an array Q\0 .. W + 11 (using the W + 1 slot for key= oo), 
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where each slot holds a doubly linked list of vertices with that weight as their 
key. Then Extract-Min takes only O(W) = 0(1) time (just scan for the first 
nonempty slot), and Decrease-Key takes only 0(1) time (just remove the ver¬ 
tex from the list it’s in and insert it at the front of the list indexed by the new key). 
This gives a total running time of 0(E), which is the best possible asymptotic time 
(since £2 ( E ) edges must be processed). 

However, if the range of edge weights is 1 to |Vj, then Extract-Min takes 
0(Vj time with this data structure. So the total time spent doing Extract-Min 
is &(V 2 ), slowing the algorithm to 0(2? + V 2 ) = 0( V 2 ). In this case, it is better 
to keep the Fibonacci-heap priority queue, which gave the 0(2? + V lg V) time. 

There are data structures not in the text that yield better running times: 

• The van Emde Boas data structure (mentioned in the chapter notes for Chapter 6 
and the introduction to Part V) gives an upper bound of 0(E + V lg lg V) time 
for Prim’s algorithm. 

• A redistributive heap (used in the single-source shortest-paths algorithm of 
Ahuja, Mehlhorn, Orlin, and Tarjan, and mentioned in the chapter notes for 
Chapter 24) gives an upper bound of O {E + V v /lg V) for Prim’s algorithm. 


Solution to Exercise 23.2-7 

We start with the following lemma. 

Lemma 

Let T be a minimum spanning tree of G = (V. E), and consider a graph G = 
(V', E') for which G is a subgraph, i.e., V c V' and E C E'. Let T = E — T be 
the edges of G that are not in T. Then there is a minimum spanning tree of G that 
includes no edges in T. 

Proof By Exercise 23.2-1, there is a way to order the edges of E so that Kruskal’s 
algorithm, when run on G, produces the minimum spanning tree T. We will show 
that Kruskal’s algorithm, run on G, produces a minimum spanning tree V that 
includes no edges in T. We assume that the edges in E are considered in the same 
relative order when Kruskal’s algorithm is run on G and on G. We first state and 
prove the following claim. 

Claim 

For any pair of vertices u, v e V, if these vertices are in the same set after Kruskal’s 
algorithm run on G considers any edge (x, y) € E, then they are in the same set 
after Kruskal’s algorithm run on G considers (x,y). 

Proof of claim Let us order the edges of E by nondecreasing weight as ((xi, y i), 
(X 2 , >’ 2 ), • • • , (xk, } ! k))y where k = \E\. This sequence gives the order in which the 
edges of E are considered by Kruskal’s algorithm, whether it is run on G or on G. 
We will use induction, with the inductive hypothesis that if u and v are in the same 
set after Kruskal’s algorithm run on G considers an edge (x,-, y;), then they are in 
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the same set after Kruskal’s algorithm run on G considers the same edge. We use 
induction on i. 

Basis: For the basis, i = 0. Kruskal’s algorithm run on G has not considered 
any edges, and so all vertices are in different sets. The inductive hypothesis holds 
trivially. 

Inductive step: We assume that any vertices that are in the same set after Kruskal’s 
algorithm run on G has considered edges ((xi, yi), (x 2 , yi), ■ ■ ■, (x,_i, y,_i)} 
are in the same set after Kruskal’s algorithm run on G has considered the same 
edges. When Kruskal’s algorithm runs on G', after it considers (x,_i, y,_i), it may 
consider some edges in E’—E before considering (x,, y,). The edges in E’—E may 
cause Union operations to occur, but sets are never divided. Hence, any vertices 
that are in the same set after Kruskal’s algorithm run on G considers (x,-_i, y,-_i) 
are still in the same set when (x,-, y,-) is considered. 

When Kruskal’s algorithm run on G considers (x,, yi), either x, and y, are found 
to be in the same set or they are not. 

• If Kruskal’s algorithm run on G finds x, and y, to be in the same set, then 
no Union operation occurs. The sets of vertices remain the same, and so the 
inductive hypothesis continues to hold after considering (y, y,). 

• If Kruskal’s algorithm run on G finds x, and y,- to be in different sets, then the 

operation Union(x,-, yi) will occur. Kruskal’s algorithm run on G will find 
that either x, and y; are in the same set or they are not. By the inductive hypoth¬ 
esis, when edge (x,-, y,) is considered, all vertices in x, ’s set when Kruskal’s 
algorithm runs on G are in x, ’s set when Kruskal’s algorithm runs on G, and 
the same holds for y,-. Regardless of whether Kruskal’s algorithm run on G 
finds x, and y, to already be in the same set, their sets are united after consider¬ 
ing (x,-, y,), and so the inductive hypothesis continues to hold after considering 
(x/,y,). ■ (claim) 

With the claim in hand, we suppose that some edge (u, v) e T is placed into T. 
That means that Kruskal’s algorithm run on G found u and v to be in the same 
set (since (; u , v) e T) but Kruskal’s algorithm run on G found u and v to be in 
different sets (since (u, v ) is placed into T). This fact contradicts the claim, and we 
conclude that no edge in T is placed into V. Thus, by running Kruskal’s algorithm 
on G and G', we demonstrate that there exists a minimum spanning tree of G that 
includes no edges in T. m (lemma) 

We use this lemma as follows. Let G' = ( V ', E') be the graph G = (V, E) with 
the one new vertex and its incident edges added. Suppose that we have a minimum 
spanning tree T for G. We compute a minimum spanning tree for G by creating 
the graph G" = (V', £"), where E" consists of the edges of T and the edges in 
E’ — E (i.e., the edges added to G that made G), and then finding a minimum 
spanning tree V for G". By the lemma, there is a minimum spanning tree for G 
that includes no edges of E — T. In other words, G has a minimum spanning tree 
that includes only edges in T and El — E\ these edges comprise exactly the set E". 
Thus, the the minimum spanning tree T’ of G" is also a minimum spanning tree 
of G'. 
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Even though the proof of the lemma uses Kruskal’s algorithm, we are not required 
to use this algorithm to find T. We can find a minimum spanning tree by any 
means we choose. Let us use Prim’s algorithm with a Fibonacci-heap priority 
queue. Since | V'\ = |V| + 1 and | E"\ < 2\V\ — 1 (E" contains the |V| — 1 
edges of T and at most | V| edges in E' — E ), it takes O(V) time to construct G", 
and the run of Prim’s algorithm with a Fibonacci-heap priority queue takes time 
0{E" + V" Ig V') = 0(V lg V). Thus, if we are given a minimum spanning tree 
of G, we can compute a minimum spanning tree of G in 0{ V lg V) time. 


Solution to Problem 23-1 


a. To see that the minimum spanning tree is unique, observe that since the graph 
is connected and all edge weights are distinct, then there is a unique light edge 
crossing every cut. By Exercise 23.1-6, the minimum spanning tree is unique. 

To see that the second-best minimum spanning tree need not be unique, here is 
a weighted, undirected graph with a unique minimum spanning tree of weight 7 
and two second-best minimum spanning trees of weight 8: 




minimum 
spanning tree 



minimum 
spanning tree 


b. Since any spanning tree has exactly | V | — 1 edges, any second-best minimum 
spanning tree must have at least one edge that is not in the (best) minimum 
spanning tree. If a second-best minimum spanning tree has exactly one edge, 
say (x, v), that is not in the minimum spanning tree, then it has the same set of 
edges as the minimum spanning tree, except that (x, y) replaces some edge, say 
(w, v), of the minimum spanning tree. In this case, V = T — {(u, ?.')} U {(v, y )}, 
as we wished to show. 

Thus, all we need to show is that by replacing two or more edges of the min¬ 
imum spanning tree, we cannot obtain a second-best minimum spanning tree. 
Let T be the minimum spanning tree of G, and suppose that there exists a 
second-best minimum spanning tree T that differs from T by two or more 
edges. There are at least two edges in T — T, and let (u, v ) be the edge in 
T — T with minimum weight. If we were to add (u, v) to T, we would get 
a cycle c. This cycle contains some edge (x, y) in T — T (since otherwise, T 
would contain a cycle). 

We claim that w(x,y ) > w(u,v). We prove this claim by contradiction, 
so let us assume that w(x,y ) < w(u,v). (Recall the assumption that 
edge weights are distinct, so that we do not have to concern ourselves with 
w(x, y) = w(u , i>).) If we add (x, y) to T, we get a cycle d , which contains 
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some edge (u\ v') in T — T' (since otherwise, T' would contain a cycle). There¬ 
fore, the set of edges T" = T — {(if, t/)} U {(x, y )} forms a spanning tree, and 
we must also have w(u\ v') < w(x, y ), since otherwise T" would be a span¬ 
ning tree with weight less than w(T). Thus, w(u', v') < w(x, y ) < w(u, v ), 
which contradicts our choice of (u , v) as the edge in T — V of minimum weight. 

Since the edges (u, v ) and (x, y ) would be on a common cycle c if we were 
to add (u, v) to T , the set of edges T' — {(x, y)} U {(u, u)} is a spanning tree, 
and its weight is less than w(T'). Moreover, it differs from T (because it differs 
from r by only one edge). Thus, we have formed a spanning tree whose weight 
is less than w(T') but is not T . Hence, T' was not a second-best minimum 
spanning tree. 

c. We can fill in max\u, u] for all u, v e V in 0(V 2 ) time by simply doing a 
search from each vertex u, having restricted the edges visited to those of the 
spanning tree T . It doesn’t matter what kind of search we do: breadth-first, 
depth-first, or any other kind. 

We’ll give pseudocode for both breadth-first and depth-first approaches. Each 
approach differs from the pseudocode given in Chapter 22 in that we don’t need 
to compute d or / values, and we’ll use the max table itself to record whether a 
vertex has been visited in a given search. In particular, max[u, v] = NIL if and 
only if u — v or we have not yet visited vertex v in a search from vertex u. Note 
also that since we’re visiting via edges in a spanning tree of an undirected graph, 
we are guaranteed that the search from each vertex u— whether breadth-first or 
depth-first—will visit all vertices. There will be no need to “restart” the search 
as is done in the DFS procedure of Section 22.3. Our pseudocode assumes that 
the adjacency list of each vertex consists only of edges in the spanning tree T. 

Here’s the breadth-first search approach: 

BFS-Fill-Max(7\ w) 
for each vertex u e V 

do for each vertex v e V 

do max[u, t>] <— NIL 
<2^0 

Enqueue(£>, u ) 
while 0^0 

do x 4- Dequeue((2) 
for each v e Adj\x \ 

do if max[u, u] = NIL and v ^ u 

then if x = u or w (x, v ) > max[u, x] 
then max[u, u] ■*— (x, v ) 
else max[u, u] max[u, x] 

Enqueue(<2, v) 

return max 


Here’s the depth-first search approach: 
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DFS-Fill-Max(7\ w) 
for each vertex u e V 

do for each vertex v € V 

do max[u, i>] NIL 
DFS-Fill-Max-Visit (u, u, max) 
return max 

DFS-Fill-Max-Visit (u, x, max) 
for each vertex v e Adj[x ] 

do if max[u, u] = NIL and u / « 

then if v = it or w (x , u) > max[u, x | 
then max[u, u] (x, v) 
else max[u, v | max[u, x \ 

DFS-Fill-Max-Visit (u, v, max) 

For either approach, we are filling in | V | rows of the max table. Since the 
number of edges in the spanning tree is | V\ — 1, each row takes O(V) time to 
fill in. Thus, the total time to fill in the max table is 0(V 2 ). 

d. In part (b), we established that we can find a second-best minimum spanning 
tree by replacing just one edge of the minimum spanning tree T by some 
edge ( u , u) not in T . As we know, if we create spanning tree T by replacing 
edge (x, y) e T by edge (; u , v) T , then w(T') = w(T) — w(x, y) + w(u, v). 
For a given edge (; u , v), the edge (x, y) e T that minimizes w(T') is the edge of 
maximum weight on the unique path between u and v in T. If we have already 
computed the max table from part (c) based on T, then the identity of this edge 
is precisely what is stored in max[u, t>]. All we have to do is determine an edge 
( u , v) T for which w(max[u, w]) — w(u, v) is minimum. 

Thus, our algorithm to find a second-best minimum spanning tree goes as fol¬ 
lows: 

1. Compute the minimum spanning tree T. Time: CHE + V lg V ), using Prim’s 
algorithm with a Fibonacci-heap implementation of the priority queue. Since 
\E\ < \ V\ 2 , this running time is 0(V 2 ). 

2. Given the minimum spanning tree T, compute the max table, as in part (c). 
Time: 0(V 2 ). 

3. Find an edge (u, v) T that minimizes w(max\u , v \) — w(u,v). Time: 
0(E), which is 0(V 2 ). 

4. Having found an edge (u , v) in step 3, return V = T — [max[u , u|}U{(h, u)} 
as a second-best minimum spanning tree. 

The total time is 0(V 2 ). 
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Shortest paths 


How to find the shortest route between two points on a map. 

Input: 

• Directed graph G = (V, E) 

• Weight function w : E —»■ R 

Weight of path p = (no, tq, . .. , v k } 

k 

= y~]w(vi-i,Vi) 

1=1 

= sum of edge weights on path p . 

Shortest-path weight u to v: 

j _ I min {w ( p ) : u u} if there exists a path u n , 

[oo otherwise . 

Shortest path u to w is any path p such that w(p ) = 8(u , n). 
Example: shortest paths from 5 

[d values appear inside vertices. Shaded edges show shortest paths.] 



This example shows that the shortest path might not be unique. 

It also shows that when we look at shortest paths from one vertex to all other 
vertices, the shortest paths are organized as a tree. 

Can think of weights as representing any measure that 
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• accumulates linearly along a path, 

• we want to minimize. 

Examples: time, cost, penalties, loss. 

Generalization of breadth-first search to weighted graphs. 

Variants 

• Single-source: Find shortest paths from a given source vertex s e V to every 
vertex v e V. 

• Single-destination: Find shortest paths to a given destination vertex. 

• Single-pair: Find shortest path from u to v. No way known that’s better in 
worst case than solving single-source. 

• All-pairs: Find shortest path from u to v for all u. v e V. We’ll see algorithms 
for all-pairs in the next chapter. 

Negative-weight edges 

OK, as long as no negative-weight cycles are reachable from the source. 

• If we have a negative-weight cycle, we can just keep going around it, and get 
w(s, v ) = —oo for all i; on the cycle. 

• But OK if the negative-weight cycle is not reachable from the source. 

• Some algorithms work only if there are no negative-weight edges in the graph. 
We’ll be cleai - when they’re allowed and not allowed. 

Optimal substructure 

Lemma 

Any subpath of a shortest path is a shortest path. 

Proof Cut-and-paste. 



Suppose this path p is a shortest path from u to v. 
Then S(u, v) = w(p) = w(p ux ) + w(p xy ) + w(p yv ). 

Now suppose there exists a shorter path y. 

Then w(p' xy ) < w(p xy ). 

Construct p': 
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Then 

W(p') = W(p ux ) + W(p' xy ) + W{p yv ) 

< yO(Pux) + W(p X y) + W{p yv ) 

= w(p) . 

So p wasn’t a shortest path after all! ■ (lemma) 

Cycles 

Shortest paths can’t contain cycles: 

• Already ruled out negative-weight cycles. 

• Positive-weight => we can get a shorter path by omitting the cycle. 

• Zero-weight: no reason to use them => assume that our solutions won’t use 
them. 

Output of single-source shortest-path algorithm 

For each vertex v e V: 

• d[t>] = 8(s, v). 

• Initially, d\ u] = oo. 

• Reduces as algorithms progress. But always maintain d[u] > 8(s, v). 

• Call d\ v | a shortest-path estimate. 

• 7T [ v ] = predecessor of i; on a shortest path from .v. 

• If no predecessor, 7r[n] = NIL. 

• 7T induces a tree— shortest-path tree. 

• We won’t prove properties of n in lecture—see text. 

Initialization 

All the shortest-paths algorithms start with INIT-SINGLE-SOURCE. 

INIT-SINGLE-SOURCE (V, s) 
for each v e V 
do d[v] <r- oo 
7T[u] <r- NIL 
r/[5] <r- 0 

Relaxing an edge ( u, v) 

Can we improve the shortest-path estimate for v by going through u and taking 
(u, v)7 
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Relax ( m , v, w) 
if d[ ] > d[u ] + w(u , u) 
then d[ u] d[u ] + w(u, v ) 

7T[l>] u 


u V 



For all the single-source shortest-paths algorithms we’ll look at, 

• start by calling iNIT-SlNGLE-SOURCE, 

• then relax edges. 

The algorithms differ in the order and how many times they relax each edge. 


Shortest-paths properties 

Based on calling Init-Single-Source once and then calling Relax zero or 
more times. 


Triangle inequality 

For all (; u , v ) e E, we have S(s, v) < 8(s, u) + w(u, v). 

Proof Weight of shortest path .v i; is < weight of any path .v ^ u. Path 
s ^ u —> v is a path s ^ v, and if we use a shortest path s u, its weight is 
8(s, u) + w(u, v). ■ 

Upper-bound property 

Always have d[u] > 8(s, v ) for all v. Once d[n] = 8(s, v), it never changes. 
Proof Initially true. 

Suppose there exists a vertex such that d[u] < 8(s, v). 

Without loss of generality, v is first vertex for which this happens. 

Let u be the vertex that causes d\ v ] to change. 

Then d[v] = d[u] + w(u, v ). 

So, 

d[u] < 8(s, v ) 

< S(s, u) + w(u, v) (triangle inequality) 

< d[u] + w(u, v) (v is first violation) 

=> d[n] < d[u] + w(u, v) . 
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Contradicts d[n] = d[u] + w(u, v). 

Once d[n] reaches 5(5, v), it never goes lower. It never goes up, since relaxations 
only lower shortest-path estimates. ■ 


No-path property 

If 5(5, v) = oo, then rf[n] = oo always. 

Proof d[n] > (5(5, v) = oo =>• d[v] = oo. ■ 


Convergence property 

If 5 u —> v is a shortest path, d[u\ = 5(5 , u), and we call RELAX(fr, v, w), then 
rf[n] = (5(5, v) afterward. 

Proof After relaxation: 

d\v\ < d[u] + w(u, v) (Relax code) 

= <5(5, u) + w(u, V ) 

= <5(5, v) (lemma—optimal substructure) 

Since d[n] > 5(5, v), must have d[v] = 5(5, v). m 


Path relaxation property 

Let p = (no, W], ..., vf he a shortest path from s = vq to Vg. If we relax, 
in order, (vo, v i ), (i>i, Vi), . .., (Vk-i, ) , even intermixed with other relaxations, 
then d[vk] = 5(5, vf). 

Proof Induction to show that d[v,] = 5(5, v,) after (n,_i, vf) is relaxed. 

Basis: i = 0. Initially, d[i^] = 0 = 5 ( 5 , no) = 5 ( 5 , 5). 

Inductive step: Assume d[n,_i] = 5(5, n,_i). Relax (n,_i, n,). By convergence 
property, d[n,] = 5(5, n ; ) afterwai'd and d\ u, | never changes. ■ 


The Bellman-Ford algorithm 

• Allows negative-weight edges. 

• Computes d[n] and 7r [n] for all nef. 

• Returns TRUE if no negative-weight cycles reachable from 5 , FALSE otherwise. 
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Bellman-Ford (V. E, w, s ) 
Init-Single-Source(V, s) 
for / <— 1 to | V | — 1 

do for each edge (it, v) e E 
do Relax (u, v, w) 
for each edge ( u, v) € E 

do if d [] > d[u ] + w(u, v ) 
then return FALSE 
return TRUE 

Core: The first for loop relaxes all edges | V \ — 1 times. 
Time: ®(VE). 

Example: 


r 



Values you get on each pass and how quickly it converges depends on order of 
relaxation. 

But guaranteed to converge after |V| — 1 passes, assuming no negative-weight 
cycles. 

Proof Use path-relaxation property. 

Let v be reachable from ,v, and let p = {vq, Vi, , vf) be a shortest path from s 
to v, where Vo = s and v k = v. Since p is acyclic, it has < |V| — 1 edges, so 
k <\V\ — 1. 

Each iteration of the for loop relaxes all edges: 

• First iteration relaxes (vo, tq). 

• Second iteration relaxes (m, i> 2 ) ■ 

• kth iteration relaxes (v k -i, v k ). 

By the path-relaxation property, d[v] = d\ v k ] = 8(s, v k ) = 8{s, v). m 

How about the TRUE/FALSE return value? 

• Suppose there is no negative-weight cycle reachable from .s . 

At termination, for all (w, u) G E, 

d[u] = 8(s, v ) 

< 8(s. u) + w(u, v) (triangle inequality) 

= d\u\ + w(u, v) . 

So Bellman-Ford returns true. 
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• Now suppose there exists negative-weight cycle c = ( 10 , fi, ..., Vk), where 
v 0 — Vk, reachable from s. 

k 

Then u,) < 0 . 

i=i 

Suppose (for contradiction) that Bellman-Ford returns TRUE. 

Then d[v{\ < d[u,_i] + w(Vi- 1 , v,-) for i = 1,2,... ,k. 


Sum around c: 

k 

y^d[n,| < 
;'=1 


k 

y^(rf[u,-_i] + w(Vi- 1, Vi)) 
i =1 

k k 

w(Vi- 1, Vi) 

i= 1 (=1 


Each vertex appeal's once in each summation £T =1 d\ v, \ and ^] (=| d[ u,_i ] => 

k 

o < Vj). 

i= 1 

This contradicts c being a negative-weight cycle! ■ 


Single-source shortest paths in a directed acyclic graph 

Since a dag, we’re guaranteed no negative-weight cycles. 

DAG-Shortest-Paths(F, E, w, s) 
topologically sort the vertices 
Init-Single-Source(F, s) 

for each vertex u, taken in topologically sorted order 
do for each vertex v e Adj\u\ 
do Relax(i<, v, w) 


Example: 

6 1 



Time: ®(V + E). 

Correctness: Because we process vertices in topologically sorted order, edges of 
any path must be relaxed in order of appearance in the path. 

=> Edges on any shortest path are relaxed in order. 

=> By path-relaxation property, correct. ■ 
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Dijkstra’s algorithm 

No negative-weight edges. 

Essentially a weighted version of breadth-first search. 

• Instead of a FIFO queue, uses a priority queue. 

• Keys are shortest-path weights (d\ v |). 

Have two sets of vertices: 

• S = vertices whose final shortest-path weights are determined, 

• Q = priority queue = V — S. 

Dijkstra(F, E, w, s) 

Init-Single-Source(F, s) 

5 4-0 

Q <- V t> i.e., insert all vertices into Q 
while 0^0 

do u 4 - Extract-Min(i2) 

5 4- 5 U {u} 

for each vertex v e Adj\u\ 
do Relax( i<, v, w) 

• Eooks a lot like Prim’s algorithm, but computing d[n], and using shortest-path 
weights as keys. 

• Dijkstra’s algorithm can be viewed as greedy, since it always chooses the “light¬ 
est” (“closest”?) vertex in V — S to add to 5. 

Example: 


X 



Order of adding to S: s,y,z,x. 

Correctness: 

Loop invariant: At the start of each iteration of the while loop, d [ v \ = 
S(s,v) for all v e 5. 

Initialization: Initially, 5 = 0, so trivially true. 

Termination: At end, Q = 0=$>S=V^> d[u] = 8(s, v) for all re V. 
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Maintenance: Need to show that d[u] = S(s,u ) when u is added to S in each 
iteration. 

Suppose there exists u such that d\ u \ f 5(5, u). Without loss of generality, 
let u be the first vertex for which d\u ] f 5(5, u) when u is added to S. 

Observations: 

• u s, since d[5] = 5(5, 5 ) = 0. 

• Therefore, 5 e S, so S 0. 

• There must be some path 5 u, since otherwise d\u \ = 5(5, u) = 00 by 
no-path property. 

So, there’s a path 5 u. 

This means there’s a shortest path 5 ^ u. 

Just before u is added to S, path p connects a vertex in S (i.e., 5) to a vertex in 
V — S (i.e., u). 

Let y be first vertex along p that’s in V — S, and let x e S be y’s predecessor. 



Decompose p into 5 Q x —> y u. (Could have x — s or y — u, so that p\ 
or p 2 may have no edges.) 

Claim 

d[y] — 5(5, y) when u is added to S. 

Proof x e S and u is the first vertex such that d[it ] f 5(5, u) when u is added 
to S =>• d[v] = 5(5, x) when v is added to S. Relaxed (x, y ) at that time, so by 
the convergence property, d[y] = 5(5, y). m (claim) 

Now can get a contradiction to d[u] f 5( 5 , u ): 

y is on shortest path 5 u, and all edge weights are nonnegative 
=> 5(5, y) < 5(5, u) 

d[y ] = 5(5, y) 

< 5(5, M) 

< d\u \ (upper-bound property) . 

Also, both y and u were in Q when we chose u, so 
d[u] < d[y] =>■ d[u] = d[y] . 

Therefore, d[y] = 5(5, y) = 5(5, a) = d\u ]. 

Contradicts assumption that d[u\ f 5(5, u). Hence, Dijkstra’s algorithm is 
correct. ■ 
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Analysis: Like Prim’s algorithm, depends on implementation of priority queue. 

• If binary heap, each operation takes 0(lg V ) time =>■ 0(E IgV). 

• If a Fibonacci heap: 

• Each Extract-Min takes 0(1) amortized time. 

• There are 0(V) other operations, taking 0(lg V) amortized time each. 

• Therefore, time is 0(V Ig V + E). 


Difference constraints 

Given a set of inequalities of the form xj — x, < b k . 

• x’s are variables, I < i, j < n, 

• b’s are constants, 1 < k < m. 

Want to find a set of values for the x’s that satisfy all m inequalities, or determine 
that no such values exist. Call such a set of values & feasible solution. 

Example: 

*1 — X2 < 5 

X\ — X3 < 6 

X2 — X4 < — 1 

*3 — X4 < —2 

X4 — X | < —3 

Solution: x = (0, —4, —5, —3) 

Also: x = (5, 1,0, 2) = [above solution] + 5 

Lemma 

If x is a feasible solution, then so is x + d for any constant d. 

Proof x is a feasible solution => x 7 — x,- < b^ for all i, j,k 

=> (xj + d) — (.Xj + d) < bk- ■ (lemma) 

Constraint graph 

G = ( V , E), weighted, directed. 

• V = {no, Vi, i> 2 ,..., v n }: one vertex per variable + it) 

• E = {(u f , vj) : xj - xi <b k is a constraint} U {(u 0 , v x ), (v 0 , v 2 ), (v 0 , v„)} 

• w(Vo, vj) = 0 for all j 

• w(v t , Vj) = b k if xj - Xi < b k 
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Theorem 

Given a system of difference constraints, let G = (V , E) be the corresponding 
constraint graph. 

1. If G has no negative-weight cycles, then 

v = (S(vo, vG, S(v 0 . v 2 ), <5(vo, v „)) 

is a feasible solution. 

2. If G has a negative-weight cycle, then there is no feasible solution. 

Proof 

1. Show no negative-weight cycles =>• feasible solution. 

Need to show that Xj — Xj < bk for all constraints. Use 

xj = S(vo,Vj) 

Xi = <5 (To- Vi) 

bk = w(v,-,Vj). 

By the triangle inequality, 

S(v 0 ,vj) < S(v 0 , Vi) + w(vi, vj) 

Xj < Xi + b k 
Xj-Xi < b k . 

Therefore, feasible. 

2. Show negative-weight cycles =>■ no feasible solution. 

Without loss of generality, let a negative-weight cycle be c — (q, v 2 , ■ ■ ■, Vk), 
where v\ = ?y-. (vq can’t be on c, since vq has no entering edges.) c corresponds 
to the constraints 


Xi — X\ 

< 

uo(v 1, v 2 ) , 

*3 - *2 

< 

w(v 2 , v 3 ) , 

Xk -1 ~ Xk-2 

< 

w(v k - 2 , Vk-l) 

Xk Xk— 1 

< 

w(v k -u Vk) . 


[The last two inequalities above are incorrect in the first three printings of the 
book. They were corrected in the fourth printing.] 

If x is a solution satisfying these inequalities, it must satisfy their sum. 

So add them up. 
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Each Xi is added once and subtracted once. (U| = Vk =>■ x\ = xg.) 
We get 0 < w(c). 

But w(c) < 0, since c is a negative-weight cycle. 

Contradiction =>• no such feasible solution x exists. 

How to find a feasible solution 

1. Form constraint graph. 

• n + 1 vertices. 

• m + n edges. 

• ®(m+n) time. 

2. Run Bellman-Ford from v 0 . 

• 0((n + 1 )(m + «)) = 0(n 2 + nm) time. 

3. If Bellman-Ford returns false =>■ no feasible solution. 

If Bellman-Ford returns true =>• set x/ = S(v o, v,-) for all i. 


(theorem) 



Solutions for Chapter 24: 
Single-Source Shortest Paths 


Solution to Exercise 24.1-3 

The proof of the convergence property shows that for every vertex v, the shortest- 
path estimate rf[u] has attained its final value after length (any shortest-weight path 
to v ) iterations of Bellman-Ford. Thus after m passes, Bellman-Ford can 
terminate. We don’t know m in advance, so we can’t make the algorithm loop 
exactly m times and then terminate. But if we just make the algorithm stop when 
nothing changes any more, it will stop after m +1 iterations (i.e., after one iteration 
that makes no changes). 

Bellman-Ford-(m+1)(G, w,s) 

Initialize-Single-Source(G, s) 
changes <— true 
while changes = TRUE 
do changes <— FALSE 

for each edge (u, v ) e E[G] 
do Relax-m(m, v, w) 

Relax-m(w, v,w) 
if d\v] > d[u ] + w(u, v ) 
then d\v\ d[u] + w(u, v ) 

7r[u] u 

changes <— TRUE 

The test for a negative-weight cycle (based on there being a d that would change 
if another relaxation step was done) has been removed above, because this version 
of the algorithm will never get out of the while loop unless all c/’s stop changing. 


Solution to Exercise 24.2-3 

We’ll give two ways to transform a PERT chart G = (V, E ) with weights on 
vertices to a PERT chart G' = {V , E') with weights on edges. In each way, we’ll 
have that \ V'\ < 2\V\ and \E'\ < |V| + |£j. We can then run on G' the same 
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algorithm to find a longest path through a dag as is given in Section 24.2 of the 
text. 

In the first way, we transform each vertex v € V into two vertices if and v" in V'. 
All edges in E that enter v will enter if in E ', and all edges in E that leave v will 
leave v" in E'. In other words, if ( u , v) e E, then (u" , v') e E'. All such edges 
have weight 0. We also put edges (if, v") into E' for all vertices v e V, and these 
edges are given the weight of the corresponding vertex v in G. Thus, | V'\ — 2 | V|, 
\E'\ = | V| + \E\, and the edge weight of each path in G' equals the vertex weight 
of the corresponding path in G. 

In the second way, we leave vertices in V alone, but we add one new source vertex s 
to V' , so that V' = V U {s}. All edges of E are in E' , and E' also includes an 
edge ( s , v) for every vertex v e V that has in-degree 0 in G. Thus, the only vertex 
with in-degree 0 in G' is the new source s. The weight of edge (u, v) e E is the 
weight of vertex i; in G. In other words, the weight of each entering edge in G is 
the weight of the vertex it enters in G. In effect, we have “pushed back” the weight 
of each vertex onto the edges that enter it. Here, \ V'\ = | V| + 1, \E'\ < | V| + |£| 
(since no more than | V | vertices have in-degree 0 in G), and again the edge weight 
of each path in G' equals the vertex weight of the corresponding path in G. 


Solution to Exercise 24.3-3 

Yes, the algorithm still works. Let u be the leftover vertex that does not get ex¬ 
tracted from the priority queue Q. If u is not reachable from s, then d[u \ = 
1 5(s, n) = oo. If u is reachable from s, there is a shortest path p — s x —»■ u. 
When the node x was extracted, d\x \ = 8(s, x ) and then the edge (x, u) was re¬ 
laxed; thus, d[u] = S(s, u). 


Solution to Exercise 24.3-4 

To find the most reliable path between s and t, run Dijkstra’s algorithm with edge 
weights w(u, v) = — lg r(u, v ) to find shortest paths from s in 0(E-\-V \g V ) time. 
The most reliable path is the shortest path from s to t, and that path’s reliability is 
the product of the reliabilities of its edges. 

Here’s why this method works. Because the probabilities are independent, the 
probability that a path will not fail is the product of the probabilities that its edges 
will not fail. We want to find a path s t such that \[ (u v)&p r(u, v ) is maximized. 
This is equivalent to maximizing lg(T[(„,t,)ep r ( M > = £(«,«) ep [ S r (iE v), which 
is in turn equivalent to minimizing J^( U v ) €p ~ lgr (u, v). (Note: r(u, v) can be 0, 
and IgO is undefined. So in this algorithm, define IgO = —oo.) Thus if we assign 
weights w(u, v) = — lg r(u, v), we have a shortest-path problem. 

Since lg 1 = 0, lg.r < 0 for 0 < x < 1, and we have defined IgO = —oo, all the 
weights w are nonnegative, and we can use Dijkstra’s algorithm to find the shortest 
paths from s in 0(E + V lg V) time. 
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Alternate answer 

You can also work with the original probabilities by running a modified version of 
Dijkstra’s algorithm that maximizes the product of reliabilities along a path instead 
of minimizing the sum of weights along a path. 

In Dijkstra’s algorithm, use the reliabilities as edge weights and substitute 

• max (and Extract-Max) for min (and Extract-Min) in relaxation and the 
queue, 

• x for + in relaxation, 

• 1 (identity for x) for 0 (identity for +) and — oo (identity for min) for oo (iden¬ 
tity for max). 

For example, the following is used instead of the usual Relax procedure: 

Relax-Reliability (u, v,r ) 
if d[v] < d[u ] • r(u, v ) 
then d[u] 4- d[u] ■r(u , u) 

7r[u] u 

This algorithm is isomoiphic to the one above: It performs the same operations 
except that it is working with the original probabilities instead of the transformed 
ones. 


Solution to Exercise 24.3-6 

Observe that if a shortest-path estimate is not oo, then it’s at most (|Vj — 1 )W. 
Why? In order to have d[n] < oo, we must have relaxed an edge (u , v) with 
d[u ] < oo. By induction, we can show that if we relax (;u , v), then d[u] is at most 
the number of edges on a path from s to v times the maximum edge weight. Since 
any acyclic path has at most | V | — 1 edges and the maximum edge weight is W, 
we see that d[u] < (| V\ — 1 )W. Note also that d[u] must also be an integer, unless 
it is oo. 

We also observe that in Dijkstra’s algorithm, the values returned by the Extract- 
Min calls are monotonically increasing over time. Why? After we do our initial 
| V | Insert operations, we never do another. The only other way that a key value 
can change is by a Decrease-Key operation. Since edge weights are nonneg¬ 
ative, when we relax an edge (u, v), we have that d[u ] < d[v]. Since u is the 
minimum vertex that we just extracted, we know that any other vertex we extract 
later has a key value that is at least d[ii\. 

When keys are known to be integers in the range 0 to k and the key values extracted 
are monotonically increasing over time, we can implement a min-priority queue so 
that any sequence of m Insert, Extract-Min, and Decrease-Key operations 
takes 0(m + k) time. Here’s how. We use an array, say A[0 .. k], where A\ j\ is 
a linked list of each element whose key is j. Think of A[j ] as a bucket for all 
elements with key j . We implement each bucket by a circular, doubly linked list 
with a sentinel, so that we can insert into or delete from each bucket in 0(1) time. 
We perform the min-priority queue operations as follows: 
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• Insert: To insert an element with key j, just insert it into the linked list 
in A[j]. Time: 0(1) per INSERT. 

• Extract-Min: We maintain an index min of the value of the smallest key 
extracted. Initially, min is 0. To find the smallest key, look in A[min\ and, if this 
list is nonempty, use any element in it, removing the element from the list and 
returning it to the caller. Otherwise, we rely on the monotonicity property (and 
that there is no Increase-Key operation) and increment min until we either 
find a list A[min\ that is nonempty (using any element in A [ min | as before) 
or we run off the end of the array A (in which case the min-priority queue is 
empty). 

Since there are at most m INSERT operations, there are at most m elements in 
the min-priority queue. We increment min at most k times, and we remove and 
return some element at most m times. Thus, the total time over all Extract- 
Min operations is 0(m + k). 

• Decrease-Key: To decrease the key of an element from j to i, first check 
whether i < j, flagging an error if not. Otherwise, we remove the element 
from its list A\j\ in 0(1) time and insert it into the list A[/] in 0(1) time. 
Time: 0(1) per Decrease-Key. 

To apply this kind of min-priority queue to Dijkstra’s algorithm, we need to let 
k = (| V | — 1 )W , and we also need a separate list for keys with value oo. The num¬ 
ber of operations m is 0(V + E) (since there are | V\ INSERT and | V\ Extract- 
Min operations and at most \ E\ Decrease-Key operations), and so the total time 
is 0(V + E + VW) = 0(VW + E). 


Solution to Exercise 24.3-7 

First, observe that at any time, there are at most W + 2 distinct key values in 
the priority queue. Why? A key value is either oo or it is not. Consider what 
happens whenever a key value d[v] becomes finite. It must have occurred due 
to the relaxation of an edge (u, v). At that time, u was being placed into S, and 
cl\u ] < d\y ] for all vertices y e V — S. After relaxing edge (u, v ), we have 
d[v] < d[u] + W. Since any other vertex y e V — S with d[y] < oo also had its 
estimate changed by a relaxation of some edge v with d[x\ < d[u], we must have 
d\y\ < d [.r | + W < d[u\ + W. Thus, at the time that we are relaxing edges from a 
vertex u, we must have, for all vertices v e V — S, that d[u] < d[u] < d[u] + W 
or d[u] = oo. Since shortest-path estimates are integer values (except for oo), 
at any given moment we have at most W + 2 different ones: d[u],d[u] + 1, 
d[u] + 2, ..., d[u] + W and oo. 

Therefore, we can maintain the min-priorty queue as a binary min-heap in which 
each node points to a doubly linked list of all vertices with a given key value. There 
are at most W + 2 nodes in the heap, and so Extract-Min runs in 0(lg W) 
time. To perform Decrease-Key, we need to be able to find the heap node 
corresponding to a given key in 0(lg IT) time. We can do so in 0(1) time as 
follows. First, keep a pointer inf to the node containing all the oo keys. Second, 
maintain an array loc [0 .. W \ , where loc[i\ points to the unique heap entry whose 
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key value is congruent to i (mod (W + 1)). As keys move around in the heap, we 
can update this array in 0(1) time per movement. 

Alternatively, instead of using a binary min-heap, we could use a red-black tree. 
Now Insert, Delete, Minimum, and Search— from which we can construct 
the priority-queue operations—each run in 0(lg W ) time. 


Solution to Exercise 24.4-4 

Let S(u) be the shortest-path weight from s to u. Then we want to find S(t). 

5 must satisfy 

5(5) = 0 

S(v) — 8(u ) < w(u, v ) for all (w, v) e E (Lemma 24.10) 

where w(u, v) is the weight of edge (u, v). 

Thus x v = 8(v) is a solution to 
= 0 

x v -x u < w(u,v). 

To turn this into a set of inequalities of the required form, replace y = 0 by x s < 0 
and — x s < 0 (i.e., x s > 0). The constraints are now 
x s < 0 , 

-x s < 0 , 

X v -x u < w(u, u) , 

which still has x v = 5(t>) as a solution. 

However, 8 isn’t the only solution to this set of inequalities. (For example, if all 
edge weights are nonnegative, all x, = 0 is a solution.) To force x, = Sit ) as 
required by the shortest-path problem, add the requirement to maximize (the ob¬ 
jective function) x t . This is correct because 

• max(.r,) > 8 it) because x, = 8 it) is part of one solution to the set of inequali¬ 
ties, 

• max(.y) < 5(f) can be demonstrated by a technique similar to the proof of 
Theorem 24.9: 

Let p be a shortest path from ,v to t. Then by definition, 

8it) = wiu, v) . 

(u,v)ep 

But for each edge (u, v) we have the inequality Xy — x u < win, v), so 
5(0 = ^ Will, V) > ^ ( X r ~ X u) = X, ~ X s . 

( u,v)ep ( u,v)ep 

But x s = 0, so x, < Sit). 

Note: Maximizing x, subject to the above inequalities solves the single-pair 
shortest-path problem when t is reachable from s and there are no negative-weight 
cycles. But if there’s a negative-weight cycle, the inequalities have no feasible so¬ 
lution (as demonstrated in the proof of Theorem 24.9); and if t is not reachable 
from s, then x t is unbounded. 
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Solution to Exercise 24.4-7 

Observe that after the first pass, all d values are at most 0, and that relaxing 
edges ( Vo, u,) will never again change a d value. Therefore, we can eliminate ip by 
running the Bellman-Ford algorithm on the constraint graph without the node 
but initializing all shortest path estimates to 0 instead of oo. 


Solution to Exercise 24.4-10 

To allow for single-variable constraints, we add the variable x, and let it correspond 
to the source vertex vq of the constraint graph. The idea is that, if there are no 
negative-weight cycles containing ip, we will find that <5(u 0 , vq) = 0. In this case, 
we set xo = 0, and so we can treat any single-variable constraint using xj as if it 
were a 2-variable constraint with xq as the other variable. 

Specifically, we treat the constraint x* < b k as if it were x, — xo < b k , and we 
add the edge (u 0 , Vj) with weight b k to the constraint graph. We treat the constraint 
—Xj < b k as if it were x 0 — x,- < b k , and we add the edge (u,, i; 0 ) with weight b k to 
the constraint graph. 

Once we find shortest-path weights from ip, we set x, = S(vo, Vj) for all i = 
0,1 ,,n; that is, we do as before but also include xo as one of the variables that 
we set to a shortest-path weight. Since vq is the source vertex, either xo = 0 or 

xo < 0. 

If 8{vq. r>o) = 0, so that x 0 = 0, then setting x, = S(vo, Vj) for all i — 0, 1, ..., n 
gives a feasible solution for the system. The only new constraints beyond those 
in the text are those involving xq. For constraints x,- < b k , we use x, — xo < b k . 
By the triangle inequality, i)(i\>, i>,) < 8(vo, Vq) + u>(vo, Vj) = b k , and so x, < b k . 
For constraints — x, < b k , we use xo — x, < b k . By the triangle inequality, 0 = 
<i(To, Vo) < A (Vo , Vj) + w(Vj, vq); thus, 0 < x,- + b k or, equivalently, —x, < b k . 

If 8(v o, no) < 0, so that xo < 0, then there is a negative-weight cycle containing 
The portion of the proof of Theorem 24.9 that deals with negative-weight cycles 
carries through but with vq on the negative-weight cycle, and we see that there is 
no feasible solution. 


Solution to Exercise 24.5-4 

Whenever Relax sets jt for some vertex, it also reduces the vertex’s d value. Thus 
if 7r [s] gets set to a non-NlL value, d[5] is reduced from its initial value of 0 to a 
negative number. But d\s\ is the weight of some path from ,v to .v, which is a cycle 
including .v. Thus, there is a negative-weight cycle. 
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Solution to Exercise 24.5-7 

Suppose we have a shortest-paths tree G n . Relax edges in G n according to the 
order in which a BFS would visit them. Then we are guaranteed that the edges 
along each shortest path are relaxed in order. By the path-relaxation property, we 
would then have d[ t>] = <5(s, v ) for all v € V. Since G n contains at most | V | — 1 
edges, we need to relax only | V\ — 1 edges to get d[v] = 8(s, v) for all v € V. 


Solution to Exercise 24.5-8 

Suppose that there is a negative-weight cycle c = (it), vi, ..., v k ), where 
v 0 = Vk, that is reachable from the source vertex s; thus, w(c ) < 0. With¬ 
out loss of generality, c is simple. There must be an acyclic path from s to 
some vertex of c that uses no other vertices in c. Without loss of generality let 
this vertex of c be t>o, and let this path from s to Vq be p = (iio, u \, ..., uf), 
where u 0 = s and ip = vq = ly. (It may be the case that w/ = s, in 
which case path p has no edges.) After the call to Initialize-Single-Source 
sets d\v) = 00 for all v e V — { .s} , perform the following sequence of re¬ 
laxations. First, relax every edge in path p, in order. Then relax every edge 
in cycle c, in order, and repeatedly relax the cycle. That is, we relax the 
edges (u 0 , u 1 ), (wi, u 2 ), (m/_ 1 , v 0 ), (v 0 , v t ), (v u v 2 ), ■■■, (v k - 1 , u 0 ), (v Q , Vi), 

(ni, v 2 ), (v k - 1 , v 0 ), (n 0 , Vi), (v u v 2 ), (v k -i, v 0 ) . 

We claim that every edge relaxation in this sequence reduces a shortest-path es¬ 
timate. Clearly, the first time we relax an edge (m,_i,m,) or (n ; -_i, Vj), for 
i = 1,2,... ,1 and j = 1, 2, ...,& — 1 (note that we have not yet relaxed the 
last edge of cycle c), we reduce d[uj] or d\ ty ] from 00 to a finite value. Now 
consider the relaxation of any edge (u/_i, Vj) after this opening sequence of re¬ 
laxations. We use induction on the number of edge relaxations to show that this 
relaxation reduces d[vj]. 

Basis: The next edge relaxed after the opening sequence is (u-_i, v k ). Before re¬ 
laxation, d[i 4 -] = w(p), and after relaxation, d\ v k \ = w(p) + w(c) < w(p), since 
w(c) < 0. 

Inductive step: Consider the relaxation of edge (vj- 1 , vj). Since c is a sim¬ 
ple cycle, the last time d[vj] was updated was by a relaxation of this same 
edge. By the inductive hypothesis, d[v/-i\ has just been reduced. Thus, 
d[vj-i\ + w(vj-i, vj) < d[vj], and so the relaxation will reduce the value 
of d\vj\. 


Solution to Problem 24-1 


a. Assume for contradiction that Gf is not acyclic; thus Gf has a cycle. A cycle 
must have at least one edge (it, v) in which u has higher index than v. This 
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edge is not in Ef (by the definition of Ef), in contradition to the assumption 
that G f has a cycle. Thus G f is acyclic. 

(iq, i>2,, V|v|) is a topological sort for Gf, because from the definition of Ef 
we know that all edges are directed from smaller indices to larger indices. 

The proof for E h is similar. 

b. For all vertices v 6 V 7 , we know that either 5(5, v) = oo or 5(5, v) is finite. 
If 5(5, v) = oo, then d[v] will be oo. Thus, we need to consider only the 
case where d\v) is finite. There must be some shortest path from 5 to v. Let 
p = (v 0 , iq, ..., i 4 -i, Vi) be that path, where Vo = s and v k = v. Let us now 
consider how many times there is a change in direction in p, that is, a situation 
in which (u,-_i, vf) e Ef and (tiq +1 ) e E h or vice versa. There can be at 
most | V\ — 1 edges in p, so there can be at most | V\ — 2 changes in direction. 
Any portion of the path where there is no change in direction is computed with 
the correct d values in the first or second half of a single pass once the node that 
begins the no-change-in-direction sequence has the correct d value, because the 
edges are relaxed in the order of the direction of the sequence. Each change in 
direction requires a half pass in the new direction of the path. The following 
table shows the maximum number of passes needed depending on the parity of 
| V\ — 1 and the direction of the first edge: 


W\-l 

first edge direction 

passes 

even 

forward 

(|V|-l)/2 

even 

backward 

(|V|-l)/2+l 

odd 

forward 

|V|/2 

odd 

backward 

|V|/2 


In any case, the maximum number of passes that we will need is f| V\ /2~|. 

c. This scheme does not affect the asymptotic running time of the algorithm be¬ 
cause even though we perform only |~| V\ /2~| passes instead of | V\ — 1 passes, 
it is still O(V) passes. Each pass still takes 0(E) time, so the running time 
remains O(VE). 


Solution to Problem 24-2 

a. Consider boxes with dimensions v = (jq,..., Xd), y = (yi, ..., yd), and 
z = (zi, ..., Zd)- Suppose there exists a permutation tt such that x 7l ( i) < >7 
for i = l,... ,d and there exists a permutation tt' such that y n '<j) < Zi for 
i = \,...,d, so that x nests inside y and y nests inside z. Construct a 
permutation tt", where n"(i) = n'(n(i)). Then for / = I...., d, we have 
x n "(,•) = Xjf'fr (,■)) < y n 'd) < Zi, and so v nests inside z. 

b. Sort the dimensions of each box from longest to shortest. A box X with 
sorted dimensions (jq, X 2 , ■ ■ ■, Xd) nests inside a box Y with sorted dimensions 
(yi, y 2 ,..., yd) if and only if x t < y t for / = 1,2,..., d. The sorting can 
be done in 0(d Ig d) time, and the test for nesting can be done in 0(d) time, 
and so the algorithm runs in 0(d Ig d) time. This algorithm works because a 




Solutions for Chapter 24: Single-Source Shortest Paths 


24-21 


d-dimensional box can be oriented so that every permutation of its dimensions 
is possible. (Experiment with a 3-dimensional box if you are unsure of this). 

c. Construct a dag G = (V, E), where each vertex v, corresponds to box fi,, and 
( Vj , Vj ) e E if and only if box B, nests inside box B r Graph G is indeed a dag, 
because nesting is transitive and antireflexive. The time to construct the dag is 
0(dn 2 + dn lg d), from comparing each of the Q) pairs of boxes after sorting 
the dimensions of each. 

Add a supersource vertex s and a supersink vertex t to G, and add edges (s, y) 
for all vertices v, with in-degree 0 and (vj,t) for all vertices Vj with out- 
degree 0. Call the resulting dag G'. The time to do so is 0(n). 

Find a longest path from s to t in G'. (Section 24.2 discusses how to find a 
longest path in a dag.) This path corresponds to a longest sequence of nesting 
boxes. The time to find a longest path is 0(tr), since G' has n + 2 vertices and 
0(n 2 ) edges. 

Overall, this algorithm runs in 0{dn 2 + dn lg d) time. 


Solution to Problem 24-3 


a. We can use the Bellman-Ford algorithm on a suitable weighted, directed graph 
G = {V , E), which we form as follows. There is one vertex in V for each 
currency, and for each pair of currencies q and cj , there are directed edges 
( Vi, vj) and (vj, Vj). (Thus, \V\—n and \E\ = (").) 

To determine edge weights, we start by observing that 

R[i\,h\ ■ Rih, h\ ■ ■ ■ R[ik-tJk\ ■ R[h< h\ > 1 


if and only if 

11 11 
--------- < i _ 

RUuh] R\h, h\ R[ik-i,ik] RUk,h ] 

Taking logs of both sides of the inequality above, we express this condition as 


lg 


1 


R\i\, h\ 


+ lg 


1 


R\G, cl 


+ lg 


l 


R[ik-tJk\ 


+ ■ ■ ■ + lg 


1 


R[ik, i l] 


< 0 . 


Therefore, if we define the weight of edge (q, vj) as 


Ut{Vi, Vj) 


lg R[i, j ] 
-lg RUJ] , 


then we want to find whether there exists a negative-weight cycle in G with 
these edge weights. 


We can determine whether there exists a negative-weight cycle in G by adding 
an extra vertex vo with 0-weight edges (t>o, Vj) for all v, e V, running 
Bellman-Ford from vo, and using the boolean result of Bellman-Ford 
( which is TRUE if there are no negative-weight cycles and FALSE if there is a 
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negative-weight cycle) to guide our answer. That is, we invert the boolean result 
of Bellman-Ford. 

This method works because adding the new vertex with 0-weight edges 
from vq to all other vertices cannot introduce any new cycles, yet it ensures 
that all negative-weight cycles are reachable from 15 . 

It takes 0(« 2 ) time to create G, which has @(n 2 ) edges. Then it takes 0(n 3 ) 
time to run Bellman-Ford. Thus, the total time is 0(n 3 ). 

Another way to determine whether a negative-weight cycle exists is to create G 
and, without adding vq and its incident edges, run either of the all-pairs shortest- 
paths algorithms. If the resulting shortest-path distance matrix has any negative 
values on the diagonal, then there is a negative-weight cycle. 

b. Assuming that we ran Bellman-Ford to solve paid (a), we only need to find 
the vertices of a negative-weight cycle. We can do so as follows. First, relax 
all the edges once more. Since there is a negative-weight cycle, the d value of 
some vertex u will change. We just need to repeatedly follow the jr values until 
we get back to u. In other words, we can use the recursive method given by the 
Print-Path procedure of Section 22.2, but stop it when it returns to vertex u. 

The running time is 0(n 3 ) to run Bellman-Ford, plus O(n) to print the 
vertices of the cycle, for a total of 0(n 3 ) time. 


Solution to Problem 24-4 

a. Since all weights are nonnegative, use Dijkstra’s algorithm. Implement the 
priority queue as an array Q\() .. |£j + 1], where Q\i\ is a list of vertices v for 
which d[v) = i. Initialize d[v] for v ^ s to |£j + 1 instead of to oo, so that all 
vertices have a place in Q. (Any initial d[u] > 8(s, v) works in the algorithm, 
since d[u] decreases until it reaches 8(s, v).) 

The | V | Extract-Mins can be done in 0(E) total time, and decreasing a 
d value during relaxation can be done in 0(1) time, for a total running time 
of 0(E). 

• When d[v] decreases, just add v to the front of the list in Q[d[vf\. 

• Extract-Min removes the head of the list in the first nonempty slot of Q. 
To do Extract-Min without scanning all of Q, keep track of the smallest i 
for which Q\i] is not empty. The key point is that when d\ v \ decreases 
due to relaxation of edge ( u , v), d [ v | remains > d[u], so it never moves to 
an earlier slot of Q than the one that had u, the previous minimum. Thus 
Extract-Min can always scan upward in the array, taking a total of 0(E) 
time for all Extract-Mins. 

b. For all (;u , v) e E, we have W\(u, v) € {0, 1}, so <$i(j, v) < | Vj — 1 < |£j. Use 
part (a) to get the 0(E) time bound. 

c. To show that w,(w, u) = 2w,_i(w, u) or uy(w, u) = 2w;_i(w, u) + 1, observe 
that the i bits of Wi(u, v) consist of the i — 1 bits of uj,_i (u, v) followed by one 
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more bit. If that low-order bit is 0, then Wj(u, v) = 2ui,_i(w, v); if it is 1, then 
Wj(u, v ) = 2wj-\(u, v) + 1. 

Notice the following two properties of shortest paths: 

1. If all edge weights are multiplied by a factor of c, then all shortest-path 
weights are multiplied by c. 

2. If all edge weights are increased by at most c, then all shortest-path weights 
are increased by at most c (IV| — 1), since all shortest paths have at most 
| V | — 1 edges. 

The lowest possible value for w/(u, v ) is 2u>,_i (it, v), so by the first observa¬ 
tion, the lowest possible value for 5,(j\ v) is 25, _i ( 5 , v). 

The highest possible value for Wj(u, v) is 2 u>,_i(m, v) + 1. Therefore, us¬ 
ing the two observations together, the highest possible value for 8(s,v) is 
2<$ ! _ 1 0, u) + |V| - 1. 

d. We have 

Wj(u,v ) = Wj(u, v) + 28j-i(s, u) — 28j-i(s, v) 

> 2wj-\(u, v) + 28;-i(s, u) — 28i-i(s, v ) 

> 0 . 

The second line follows from paid (c). The third line follows from 
Lemma 24.10: 5 ,_i(5 , v) < <$ ( -i(s, u) + wi,_i(w, v). 

e. Observe that if we compute Wj ( p ) for any path p : u ^ v, the terms Ls , t ) 
cancel for every intermediate vertex t on the path. Thus, 

w,(p) = wj (p) + 2 8 t -i (s, u) - 2S,-_i ( 5 , v ) . 

(This will be shown in detail in equation (25.10) within the proof of 
Lemma 25.1.) The terms depend only on u, v, and s, but not on the path p; 
therefore the same paths will be of minimum iv, weight and of minimum w, 
weight between u and v. Letting u = s, we get 

8i(s, v) = 8i(s, v) + 2Si-iis, s) - 25,-_i (s, v) = 8j(s, v) - 2<5,-_i(5 1 , v) . 

Rewriting this result as 8j(s, v) = 8 t (s, v) + 25, _i (s, v) and combining it with 
8i(s,v) < 28, _i (s, u) + |V| —1 (from paid (c)) gives us 8j(s, v) < |V| — 1 < \E\. 

f. To compute 8;(s, v) from 5 ,_i(j, v) for all v e V in 0(E) time: 

1. Compute the weights u;, (m, v) in 0(E) time, as shown in paid (d). 

2. By paid (e), 8 t (s, u) < \E\, so use paid (a) to compute all 8 t (s, v) in 0(E) 
time. 

3. Compute all 8/(s, v) from 8,(s, u) and 5,_| (.y, v ) as shown in part (e), in 
O(V) time. 

To compute all 8(s, v) in 0(E lg W) time: 

1. Compute 5i(s, v) for all v € V. As shown in paid (b), this takes 0(E) time. 

2. For each i = 2, 3, ..., k, compute all 8 t (s, u) from <?>,_! (s, v) in 0(E) 
time as shown above. This procedure computes 5(5, u) = 8 k (u, v) in time 
O(Ek) = 0(E lg W). 
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Solution to Problem 24-6 

Observe that a bitonic sequence can increase, then decrease, then increase, or it can 
decrease, then increase, then decrease. That is, there can be at most two changes of 
direction in a bitonic sequence. Any sequence that increases, then decreases, then 
increases, then decreases has a bitonic sequence as a subsequence. 

Now, let us suppose that we had an even stronger condition than the bitonic prop¬ 
erty given in the problem: for each vertex v € V, the weights of the edges along 
any shortest path from s to v are increasing. Then we could call Initialize- 
SINGLE-SOURCE and then just relax all edges one time, going in increasing order 
of weight. Then the edges along every shortest path would be relaxed in order of 
their appearance on the path. (We rely on the uniqueness of edge weights to en¬ 
sure that the ordering is correct. [Note that the uniqueness assumption was added 
in the fifth printing of the text.] ) The path-relaxation property (Lemma 24.15) 
would guarantee that we would have computed correct shortest paths from s to 
each vertex. 

If we weaken the condition so that the weights of the edges along any shortest path 
increase and then decrease, we could relax all edges one time, in increasing order 
of weight, and then one more time, in decreasing order of weight. That order, along 
with uniqueness of edge weights, would ensure that we had relaxed the edges of 
every shortest path in order, and again the path-relaxation property would guarantee 
that we would have computed correct shortest paths. 

To make sure that we handle all bitonic sequences, we do as suggested above. That 
is, we perform four passes, relaxing each edge once in each pass. The first and third 
passes relax edges in increasing order of weight, and the second and fourth passes 
in decreasing order. Again, by the path-relaxation property and the uniqueness of 
edge weights, we have computed correct shortest paths. 

The total time is 0(V + E lg V), as follows. The time to sort \ E\ edges by weight 
is 0(E lg E) = 0(E lg V) (since \E\ = 0(V 2 )). Initialize-Single-Source 
takes O(V) time. Each of the four passes takes 0(E) time. Thus, the total time is 
0(E IgV + V + E) = 0(V + E lg V). 
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Chapter 25 overview 

Given a directed graph G = (V , E ), weight function w : E —> R, | V\ = n. 

Goal: create an n x n matrix of shortest-path distances 8(u, v). 

Could run Bellman-Ford once from each vertex: 

• 0(V 2 E)— which is 0(V 4 ) if the graph is dense (E = 0(F 2 )). 

If no negative-weight edges, could run Dijkstra’s algorithm once from each vertex: 

• 0(VE lg V) with binary heap— 0(V 3 lg V) if dense, 

• 0(V 2 lg V + VE) with Fibonacci heap— 0(V 2 ) if dense. 

We’ll see how to do in 0(V 3 ) in all cases, with no fancy data structure. 

Shortest paths and matrix multiplication 

Assume that G is given as adjacency matrix of weights: W = (w/j), with vertices 
numbered 1 to n. 

0 if i = j , 

Wjj = ■ weight of (7, j ) if € E , 

oo if i j, O', j) i E . 

Output is matrix D = (d,,), where d,j = 8(i , j). Won’t worry about predeces¬ 
sors—see book. 

Will use dynamic programming at first. 

Optimal substructure: Recall: subpaths of shortest paths are shortest paths. 

Recursive solution: Let /|" 51 = weight of shortest path i j that contains < m 
edges. 

• m = 0 

=> there is a shortest path i ^ j with < m edges if and only if i = j 

=»=|° 

l J y oo if / ^ j . 
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• m > 1 

(k is all possible predecessors of j) 

= min {l\™~ l) + w kj} 

1 <k<n 1 ' 

since wjj = 0 for all j . 

• Observe that when m = 1, must have iff* = 

IJ J 

Conceptually, when the path is restricted to at most 1 edge, the weight of the 
shortest path i j must be u. 

And the math works out, too: 
iff = min {lf® + w kj } 

3 1 <k<n 1 1 

= iff + Wij (if® is the only non-oo among if®) 

= Wjj . 

All simple shortest paths contain < n — 1 edges 

=* Hi, j) = tf~ l) = = # +1) = • • • 

Compute a solution bottom-up: Compute L (l> . L {1) , ..., L (,,_1) . 

Start with L (1) = W, since iff = Wjj. 

Go from L (m_1) to L (m) : 

Extend (L, W , n) 
create L' , an n x n matrix 
for / <— 1 to n 

do for j 1 to n 
do l-j <r- oo 

for k <— 1 to n 

do I'lj <r- min(/f, l ik + w kj ) 

return L' 

Compute each L (m) : 

Slow-APSP(1V, n) 

L (1) «- IV 

for m <— 2 to n — 1 

do L (m) «- Extend(L ( m_1) , IV, n) 
return L (,,_1) 

Time: 

• Extend: 0(« 3 ). 

• SLOW-APSP: 0(n 4 ). 
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Observation: Extend is like matrix multiplication: 

L —► A 

W —> B 

L' —>■ C 

min —> + 

+ -»• • 

oo —^ 0 

create C, an n x n matrix 

for / ^— 1 to n 

do for j •<— 1 to n 

do Cij 0 

for k <— I to n 

do Cij < Cij 4- a lk ■ bkj 

So, we can view Extend as just like matrix multiplication! 

Why do we care? 

Because our goal is to compute L (,,_1) as fast as we can. Don’t need to compute 
all the intermediate L (1 \ L (2) , L (3) , ..., L (n ~ 2) . 

Suppose we had a matrix A and we wanted to compute A" -1 (like calling Extend 
n — 1 times). 

Could compute A, A 2 , A 4 , A 8 , ... 

If we knew A'" = A" -1 for all m > n — 1, could just finish with A r , where r is the 
smallest power of 2 that’s > n — 1. (r = 

Faster-APSP(W, n) 

L a) «- W 
m 1 

while m < n — 1 

do L (2m) <- Extend (L (m) , L (m) , n) 
m 2m 
return L (m) 

OK to overshoot, since products don’t change after L ( " _1) . 

Time: (-)(/; 3 Ig n). 

Floyd-Warshall algorithm 

A different dynamic-programming approach. 

For path p = (v \, m, ..., Vi), an intermediate vertex is any vertex of p other than 
U| or Vi. 

Let d'P = shortest-path weight of any path i j with all intermediate vertices in 
{1,2,..., k}. 

Consider a shortest path i j with all intermediate vertices in{1, 2 ,,k}: 
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• If k is not an intermediate vertex, then all intermediate vertices of p are in 

{1,2, —1}. 

• If k is an intermediate vertex: 



all intermediate vertices in {1, 2,£-1} 

Recursive formulation 

,(k) | w<j if * = 0 , 

(Have d-j’ = Wjj because can’t have intermediate vertices => < I edge.) 
Want D (n) = (djj 1 ), since all vertices numbered < n. 


Compute bottom-up 

Compute in increasing order of k : 

Floyd-Warshall(W, n) 

D (0) 4r- W 

for k I to n 

do for / ^ — 1 to n 

do for / I to n 

do dlf <r- min d%~ l) + dff l) ) 

return D tn] 

Can drop superscripts. (See Exercise 25.2-4 in text.) 

Time: 0(n 3 ). 


Transitive closure 

Given G = ( V , E ), directed. 

Compute G* = ( V , E*). 

• E* = {(/, j) : there is a path i j in G}. 

Could assign weight of 1 to each edge, then run Floyd-Warshall. 

• If d[ f < n, then there is a path i j. 

• Otherwise, d,j = oo and there is no path. 
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Simpler way: Substitute other values and operators in Floyd-Warshall. 


• Use unweighted adjacency matrix 

• min ->■ V (OR) 

• H —> A (AND) 

(k) _ 11 if there is path i j with all intermediate vertices in {1, 2, ...,£} , 
d 10 otherwise . 


( 0 ) _ (0 if i ^ j and (i, j) £ E , 
U if i = ./ or (i, j) e E . 


t>y = t : 
'j >j 


v (f[ 


ik 


A t, 


kj 


’)• 


Transitive-Closure (E, n) 

for / ^— 1 to n 

do for j <- 1 to n 

do if i = j or (/, j) e £"[G] 

then <— 1 
else 0 

for k -e- 1 to n 

do for / ^— 1 to n 

do for j 4- 1 to n 

do f +- v (tt" A t^) 

return T {n) 


Time: @(n 3 ), but simpler operations than Floyd-Warshall. 


Johnson’s algorithm 

Idea: If the graph is sparse, it pays to run Dijkstra’s algorithm once from each 
vertex. 

If we use a Fibonacci heap for the priority queue, the running time is down 
to 0(U 2 lg V + VE), which is better than Floyd-Warshall’ s ©(V 3 ) time if 
E = o(V 2 ). 

But Dijkstra’s algorithm requires that all edge weights be nonnegative. 

Donald Johnson figured out how to make an equivalent graph that does have all 
edge weights > 0. 


Reweighting 

Compute a new weight function w such that 

1. For all u , v € V, p is a shortest path u ^ v using w if and only if p is a shortest 
path u v using w. 

2. For all ( u,v)eE , w(u, v ) > 0. 
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Property (1) says that it suffices to find shortest paths with w. Property (2) says we 
can do so by running Dijkstra’s algorithm from each vertex. 

How to come up with w? 

Lemma shows it’s easy to get property (1): 

Lemma (Reweighting doesn’t change shortest paths) 

Given a directed, weighted graph G = (V, E), w : E —»■ R. Let h be any function 
such that h : V —»■ R. For all (u, v ) e E, define 

w(u, v) — w(u, v) + h(u) — h(v) . 

Let p = (uo, V \,..., v k ) be any path n 0 ^ v k- 

Then, p is a shortest path vq v k with w if and only if p is a shortest path it, v k 
with w. (Formally, w(p) = 8(v o, v k ) if and only if w = 8(v o, v k ), where 8 is the 
shortest-path weight with w.) 

Also, G has a negative-weight cycle with w if and only if G has a negative-weight 
cycle with w. 

Proof First, we’ll show that w(p) = w(p) + h(vf) — h(v k ): 

k 

w(p ) = y^w(vj- 1, vi) 

i= 1 
k 

= 1 , Vi) + h(vi—i) - h(v t )) 

i= 1 
k 

= ^ w(u/_i, i>/) + /? (n 0 ) — h (v k ) (sum telescopes) 

;=i 

= w(p) +h(v 0 ) - h(v k ) . 

Therefore, any path vq v k has w(p) = w(p ) + /i(uo) — h(v k ). Since /z(i> 0 ) 
and h(v k ) don’t depend on the path from in to iy-, if one path i; 0 ^ v k is shorter 
than another with w, it’s also shorter with w. 

Now show there exists a negative-weight cycle with w if and only if there exists a 
negative-weight cycle with w: 

• Let cycle c = (no, Vi, ..., v k ), where no = v k . 

• Then 

uJ(c) = w(c) + h(v 0 ) - h(v k ) 

= w(c) (since no = v k ) . 

Therefore, c has a negative-weight cycle with w if and only if it has a negative- 
weight cycle with w. m (lemma) 

So, now to get property (2), we just need to come up with a function h : V —► R 
such that when we compute w(u, v) = w(u, n) + h(u ) — h{v), it’s > 0. 

Do what we did for difference constraints: 

• G' = (V r , E') 
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• V' — V U {.s }, where 5 is a new vertex. 

• E' = EU {(5, v):ve V}. 

• w(s, u) = 0 for all v € V. 

• Since no edges enter s, G has the same set of cycles as G. In particular, G has 
a negative-weight cycle if and only if G does. 

Define h(v) = 5(5, u) for all v e V. 

Claim 

w(u , v) = w(u, v) + h{u ) — h(v ) > 0. 

Proof By the triangle inequality, 

8(s, v) < 8{s,u) + w(u,v) 
h(v ) < h(u ) + w(u, v ) . 

Therefore, iu(m, u) + h(u ) — h(v) >0. ■ (claim) 


Johnson’s algorithm 

form G 

run Bellman-Ford on G to compute 5(5, v) for all v € V 
if Bellman-Ford returns false 
then G has a negative-weight cycle 

else 

compute w(u, v) = w(u, v) + 8(s, u) — 8 (s, v) for all (u, v) e E 
for each vertex u e V 

do run Dijkstra’s algorithm from u using weight function w 
to compute S(u, v) for all i; e V 
for each vertex v € V 

do > Compute entry d uv in matrix D 
duv = 8(u, v) + 5(5, v) — 5(5, u) 

because if p is a path u ^ v, 
then w{p) = w{p) + h(u) — h(v) 


Time: 

• 0(V + E) to compute G. 

• 0(VE) to run Bellman-Ford. 

• &(E ) to compute w. 

• 0(V 2 lg V + VE) to run Dijkstra’s algorithm | V\ times (using Fibonacci heap). 

• 0(F 2 ) to compute D matrix. 

Total: 0(V 2 lgV + VE). 
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Solution to Exercise 25.1-3 

The matrix L (0) corresponds to the identity matrix 



(\ 

0 

0 ••• 

°\ 


0 

1 

0 ••• 

0 

/ = 

0 

0 

1 ••• 

0 


vO 

0 

0 ••• 

1 y 


of regular matrix multiplication. Substitute 0 (the identity for +) for oo (the iden¬ 
tity for min), and 1 (the identity for •) for 0 (the identity for +). 


Solution to Exercise 25.1-5 

The all-pairs shortest-paths algorithm in Section 25.1 computes 
l } n ~D — y/ n ~ x — L (0> ■ W n ~ { 

where i)" _1) = 8(i, j) and L ({]] is the identity matrix. That is, the entry in the 
ith row and /1h column of the matrix “product” is the shortest-path distance from 
vertex i to vertex j, and row i of the product is the solution to the single-source 
shortest-paths problem for vertex i. 

Notice that in a matrix “product” C = A ■ B , the ith row of C is the ith row of A 
“multiplied” by B. Since all we want is the ith row of C, we never need more than 
the ith row of A. 

Thus the solution to the single-source shortest-paths from vertex i is ll''' ■ IV" -1 , 
where L l0) is the ith row of L <0) —a vector whose ith entry is 0 and whose other 
entries are oo. 

Doing the above “multiplications” starting from the left is essentially the same 
as the Bellman-Ford algorithm. The vector corresponds to the d values in 
Bellman-Ford— the shortest-path estimates from the source to each vertex. 

• The vector is initially 0 for the source and oo for all other vertices, the same as 
the values set up for d by INITIALIZE-SINGLE-SOURCE. 
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• Each “multiplication” of the current vector by W relaxes all edges just as 
Bellman-Ford does. That is, a distance estimate in the row, say the distance 
to v, is updated to a smaller estimate, if any, formed by adding some w(u, v) to 
the current estimate of the distance to u. 

• The relaxation/multiplication is done n — 1 times. 


Solution to Exercise 25.1-10 

Run Slow-All-Pairs-Shortest-Paths on the graph. Look at the diagonal el¬ 
ements of L lm> . Return the first value of rn for which one (or more) of the diagonal 
elements is negative. If m reaches n + 1, then stop and declare that there are 
no negative-weight cycles. 

Let the number of edges in a minimum-length negative-weight cycle be rtf, where 
m* = oo if the graph has no negative-weight cycles. 

Correctness: Let’s assume that for some value m* < n and some value of i, we 
find that /{•” ’ < 0. Then the graph has a cycle with m* edges that goes from vertex i 
to itself, and this cycle has negative weight (stored in t™ , ). This is the minimum- 
length negative-weight cycle because Slow-All-Pairs-Shortest-Paths com¬ 
putes all paths of 1 edge, then all paths of 2 edges, and so on, and all cycles shorter 
than m* edges were checked before and did not have negative weight. Now as¬ 
sume that for all m < n, there is no negative element. This means there is no 
negative-weight cycle in the graph, because all cycles have length at most n. 

Time: 0(n 4 ). More precisely, 0(n 3 • mint/?, m *)). 


Faster solution 

Run Faster-All-Pairs-Shortest-Paths on the graph until the first time that 
the matrix L ( ' n) has one or more negative values on the diagonal, or until we have 
computed L {m) for some m > n. If we find any negative entries on the diagonal, 
we know that the minimum-length negative-weight cycle has more than m/2 edges 
and at most m edges. We just need to binary search for the value of rtf in the range 
m/2 < m* < m. The key observation is that on our way to computing L (m) , we 
computed L (l \ L (2 \ L (4) , L (8) , ..., L (m / 2) , and these matrices suffice to compute 
every matrix we’ll need. Here’s pseudocode: 
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Find-Min-Length-Neg-Weight-Cycle(W) 
n 4 — mws\ W\ 

L (1) «- W 

m 1 

while m < n and no diagonal entries of L ( ' n) are negative 
do L (2 "° Extend-Shortest-Paths (L {m) , L (m) ) 
m <— 2m 

if m > n and no diagonal entries of L ,m> are negative 
then return “no negative-weight cycles” 

elseif m <2 
then return m 
else 

low <— m /2 
high <— m 
d m/4 
while d > 1 

do ^ low +d 

L (s) -tr- Extend-Shortest-Paths(L ( ' om ’\ L (</) ) 
if L (s) has any negative entries on the diagonal 
then high s 
else low <— s 
d «- d/2 
return high 

Correctness: If, after the first while loop, m > n and no diagonal entries of L (m) 
are negative, then there is no negative-weight cycle. Otherwise, if rn < 2, then 
either m — 1 or m = 2, and L {m) is the first matrix with a negative entry on the 
diagonal. Thus, the correct value to return is m. 

If m > 2, then we maintain an interval bracketed by the values low and high, such 
that the correct value m* is in the range low < m* < high. We use the following 
loop invariant: 

Loop invariant: At the staid of each iteration of the “while d > 1” loop, 

1. d = 2 P for some integer p > — 1, 

2. d = (high — low)/ 2, 

3. low < m* < high. 

Initialization: Initially, m is an integer power of 2 and m > 2. Since d = m /4, 
we have that d is an integer power of 2 and d > 1/2, so that d — 2 P for some 
integer p > 0. We also have (high — low)/2 = (m — (m/2))/2 — m/4 = d. 
Finally, L (m> has a negative entry on the diagonal and L (m / 2) does not. Since 
low = m/2 and high = m, we have that low < m* < high. 

Maintenance: We use high, low, and d to denote variable values in a given it¬ 
eration, and high', low', and d' to denote the same variable values in the next 
iteration. Thus, we wish to show that d = 2 P for some integer p > — 1 im¬ 
plies d' = 2 P for some integer p' > — 1, that d = ( high — low) /2 implies 
d' = (high' — low')/2, and that low < m* < high implies low' < m* < high'. 
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To see that d' = 2 P ', note that d' = d/2, and so d = 2 P 1 . The condition that 
d > 1 implies that p > 0, and so p' > — 1. 

Within each iteration, s is set to low +d, and one of the following actions oc¬ 
curs: 

• If L (s) has any negative entries on the diagonal, then high is set to s and 
d' is set to d/2. Upon entering the next iteration, (high — low')/2 = 
(s — low')/2 = (( low+d ) — low)/2 = d/2 = d'. Since L (s> has a negative 
diagonal entry, we know that m* < s. Because high' = s and low' = low, 
we have that low < m* < high'. 

• If L (s> has no negative entries on the diagonal, then low is set to s, and 
d' is set to d/2. Upon entering the next iteration, (high ' — low') /2 = 
(high' —s)/2 = (high — (low +d))/2 = (high — low)/2 — d/2 = d — d/2 = 
d/2 = d'. Since L (s) has no negative diagonal entries, we know that m* > s. 
Because low' = s and high' = high, we have that low < m* < high'. 

Termination: At termination, d < 1. Since d = 2 P for some integer p > — 1, 
we must have p = —1, so that d = 1/2. By the second paid of the loop 
invariant, if we multiply both sides by 2, we get that high — low = 2d = 1. 
By the third paid of the loop invariant, we know that low < m* < high. Since 
high — low = 2d = 1 and m* > low, the only possible value for m* is high, 
which the procedure returns. 

Time: If there is no negative-weight cycle, the first while loop iterates QOg n) 
times, and the total time is 0(n 3 lg n). 

Now suppose that there is a negative-weight cycle. We claim that each time we 
call Extend-Shortest-Paths (L^ ow \ L^l), we have already computed L (low) 
and L ,d> . Initially, since low = m/2, we had already computed L tu,w) in the first 
while loop. In succeeding iterations of the second while loop, the only way that low 
changes is when it gets the value of s, and we have just computed L (s) . As for L (d \ 
observe that d takes on the values m/4, m/8, m/16,..., 1, and again, we computed 
all of these L matrices in the first while loop. Thus, the claim is proven. Each of 
the two while loops iterates 0(lgm*) times. Since we have already computed the 
parameters to each call of Extend-Shortest-Paths, each iteration is dominated 
by the 0(n 3 )-time call to Extend-Shortest-Paths. Thus, the total time is 
0(« 3 lg m*). 

In general, therefore, the running time is 0 (if lgminfn, m*)). 

Space: The slower algorithm needs to keep only three matrices at any time, 
and so its space requirement is 0(n 3 ). This faster algorithm needs to main¬ 
tain 0(lg minfn, m*)) matrices, and so the space requirement increases to 
0(/? 3 lg min(«, m*)). 


Solution to Exercise 25.2-4 

With the superscripts, the computation is d j k j ) min dj^~ i] + d { k k ~ l) ). If, 

having dropped the superscripts, we were to compute and store 4 k or dy before 
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using these values to compute djj, we might be computing one of the following: 
min «- 1) ,4 ) +<" 1) ) 

4 mm «- 1 \4- I) +0 

4 } - min (dg-V.dS’+dg') 

In any of these scenarios, we’re computing the weight of a shortest path from i to j 
with all intermediate vertices in {1, 2 , ,k}. If we use 4 k) , rather than d ( ik 
in the computation, then we’re using a subpath from i to k with all intermediate 
vertices in {1, 2 , ,k}. But k cannot be an intermediate vertex on a shortest path 

from i to k, since otherwise there would be a cycle on this shortest path. Thus, 
d ( ik = dj k ~ u . A similar argument applies to show that dj,j = d ( k ~ u . Hence, we 
can drop the superscripts in the computation. 


Solution to Exercise 25.2-6 

Here are two ways to detect negative-weight cycles: 

1. Check the main-diagonal entries of the result matrix for a negative value. There 

is a negative weight cycle if and only if c^\ 0 for some vertex i : 

(ri) 

• d H is a path weight from i to itself; so if it is negative, there is a path from i 
to itself (i.e., a cycle), with negative weight. 

• If there is a negative-weight cycle, consider the one with the fewest vertices. 

• If it has just one vertex, then some wp < 0, so dp starts out negative, and 
since d values are never increased, it is also negative when the algorithm 
terminates. 

• If it has at least two vertices, let k be the highest-numbered vertex in the 
cycle, and let i be some other vertex in the cycle. ct ik ~ 1 ' and d ki ~ U have 
correct shortest-path weights, because they are not based on negative- 
weight cycles. (Neither d ( ik ~ 11 nor d ki ~ 11 can include k as an intermediate 
vertex, and i and k are on the negative-weight cycle with the fewest 
vertices.) Since i k i is a negative-weight cycle, the sum of 
those two weights is negative, so d (k) will be set to a negative value. 
Since d values are never increased, it is also negative when the algorithm 
terminates. 

In fact, it suffices to check whether d\" - 1} < 0 for some vertex i. Here’s why. 
A negative-weight cycle containing vertex i either contains vertex n or it does 
not. If it does not, then clearly < 0. If the negative-weight cycle contains 

vertex n, then consider This value must be negative, since the cycle, 

starting and ending at vertex n, does not include vertex n as an intermediate 
vertex. 

2. Alternatively, one could just run the normal Floyd-WARSH ALL algorithm one 
extra iteration to see if any of the d values change. If there are negative cycles, 
then some shortest-path cost will be cheaper. If there are no such cycles, then 
no d values will change because the algorithm gives the correct shortest paths. 
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Solution to Exercise 25.3-4 

It changes shortest paths. Consider the following graph. V = {s,x,y,z}, and 
there are 4 edges: w(s,x ) = 2, w(x,y ) = 2, w(s,y ) = 5, and w(s,z) = —10. 
So we’d add 10 to every weight to make w. With w, the shortest path from s to y 
is s —»■ v —»■ y, with weight 4. With w, the shortest path from s to y is s y, 
with weight 15. (The path s -> * —»■ y has weight 24.) The problem is that by just 
adding the same amount to every edge, you penalize paths with more edges, even 
if their weights are low. 


Solution to Exercise 25.3-6 

In this solution, we assume that oo — oo is undefined; in particular, it’s not 0. 

Let G = (V, E), where V = {s,u}, E = {(w,s)}, and w(u,s ) = 0. There 
is only one edge, and it enters s. When we run Bellman-Ford from s, we get 
h(s) = 8(s,s) = 0 and h(u) = 8(s,u ) = oo. When we reweight, we get 
w(u, s) = 0 + oo — 0 = oo. We compute 8(u, s ) = oo, and so we compute 
d us = oo + 0 — oo 7 ^ 0. Since 8(u , 5 ) = 0, we get an incorrect answer. 

If the graph G is strongly connected, then we get h(v) = 8(s, v) < 00 for all 
vertices v e V. Thus, the triangle inequality says that h(v) < h(u) + w(u, v) for all 
edges (; u , v) e E, and so w(u, v ) = w(u, v)+h(u) — h(v) > 0. Moreover, all edge 
weights w(u, v) used in Lemma 25.1 are finite, and so the lemma holds. Therefore, 
the conditions we need in order to use Johnson’s algorithm hold: that reweighting 
does not change shortest paths, and that all edge weights ui(u, v ) are nonnegative. 
Again relying on G being strongly connected, we get that 8(u, v) < oo for all 
edges ( u , v) € E, which means that d uv = 8 ( 11 , v) + h(v) — h(u ) is finite and 
correct. 


Solution to Problem 25-1 

a. Let T be the | V\ x | V\ matrix representing the transitive closure, such that 
T[i, j ] is 1 if there is a path from i to j, and 0 if not. 

Initialize T (when there are no edges in G) as follows: 

T\i j] = { 1 ’ = j ’ 

L,JJ [0 otherwise. 

T can be updated as follows when an edge (u, v) is added to G : 

Transitive-Closure-Update (u, v) 

for i <r- 1 to | V | 

do for j <- 1 to | V | 

do if T[i, u\ = 1 and T[v, j ] = 1 
then T\i, j ] 1 
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• This says that the effect of adding edge ( u , v) is to create a path (via the new 
edge ) from every vertex that could already reach u to every vertex that could 
already be reached from v. 

• Note that the procedure sets T[u, v] <— 1, because of the initial values 
T[u, u] = T[v, u] = 1. 

• This takes 0(U 2 ) time because of the two nested loops. 

b. Consider inserting the edge (v n , iq) into the straight-line graph V\ —»■ i; 2 —>■ 

■■■—>• , where n = \ V \. 

Before this edge is inserted, only n(n + l)/2 entries in T are 1 (the entries on 
and above the main diagonal). After the edge is inserted, the graph is a cycle 
in which every vertex can reach every other vertex, so all tf entries in T are 1. 
Hence nr — (n(n + l)/2) = 0(« 2 ) = 0(Y 2 ) entries must be changed in T, 
so any algorithm to update the transitive closure must take £2(V 2 ) time on this 
graph. 

c. The algorithm in paid (a) would take 0(U 4 ) time to insert all possible 0 (V 2 ) 
edges, so we need a more efficient algorithm in order for any sequence of in¬ 
sertions to take only 0(V 3 ) total time. 

To improve the algorithm, notice that the loop over j is pointless when 
T[i, o] = 1. That is, if there is already apath i v, then adding the edge (u , v) 
can’t make any new vertices reachable from i. The loop to set T\i. j] to 1 for j 
such that there’s a path v j is just setting entries that are already 1. Eliminate 
this redundant processing as follows: 

Transitive-Closure-Update (u, v ) 

for i 1 to | V | 

do if T[i, u] = 1 and T[i, u] = 0 

then for / I to | V | 

do if T[v, j] = 1 

then T\i, j ] 1 

We show that this procedure takes 0(V 3 ) time to update the transitive closure 
for any sequence of n insertions: 

• There can’t be more than | V| 2 edges in G, so n < I V| 2 . 

• Summed over n insertions, time for the first two lines is 0(n V) = 0(V 3 ). 

• The last three lines, which take 0(U) time, are executed only 0(V 2 ) times 
for n insertions. To see this, notice that the last three lines are executed only 
when T\i, u] = 0, and in that case, the last line sets T[i, u] <— 1. Thus, the 
number of 0 entries in T is reduced by at least 1 each time the last three lines 
run. Since there are only |U| 2 entries in T, these lines can run at most | V| 2 
times. 

• Hence the total running time over n insertions is 0(V 3 ). 
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Chapter 26 overview 

Network flow 

Use a graph to model material that flows through conduits. 

Each edge represents one conduit, and has a capacity , which is an upper bound on 
the flow rate = units/time. 

Can think of edges as pipes of different sizes. But flows don’t have to be of liquids. 
Book has an example where a flow is how many trucks per day can ship hockey 
pucks between cities. 

Want to compute max rate that we can ship material from a designated source to a 
designated sink. 


Flow networks 

G = {V, E ) directed. 

Each edge (u, v ) has a capacity c(u, v) > 0. 

If (; u , v) qL E, then c(u, v ) = 0. 

Source vertex s, sink vertex t, assume s v t for all v e V. 
Example: [Edges are labeled with capacities.] 



[In these notes, we define positive flow first because it’s more intuitive to students 
than the flow formulation in the book. We’ll migrate towards flow in the book 
soon. We’ll call it “net flow” at first in the lecture notes. Net flow tends to be 
mathematically neater to work with than positive flow.] 
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Positive flow: A function p : V x V —> R satisfying 

• Capacity constraint: For all u, v e V, 0 < p(u. v ) < c(u, v ), 

• Flow conservation: For all u e V — {.v, t], ^ p(v, it) = ^ p ( u , u) . 

veV veV 

_ _ flow into a flow out of u 

Equivalently, p(u, v) — E p(v, u ) = 0. 

ueV ueV 

[Add Hows to previous example. Edges here are labeled as flow/capacity.] 



• Note that all positive flows are < capacities. 

• Verify flow conservation by adding up flows at a couple of vertices. 

• Note that all positive flows = 0 is legitimate. 

Cancellation with positive flows 

• Without loss of generality, can say positive flow goes either from u to v or from 
u to u, but not both. (Because if not true, can transform by cancellation to be 
true.) 

• In the above example, we can “cancel” 1 unit of flow in each direction between 
v and z. 

1 unit x -+ z 0 units v —► z 

2 units z —► v 1 unit z —>■ x 

• In both cases, “net flow” is 1 unit z —> x. 

• Capacity constraint is still satisfied (because flows only decrease). 

• Flow conservation is still satisfied (flow in and flow out are both reduced by the 
same amount). 

Here’s a concept similar to positive flow: 

Net flow: A function / :VxV->R satisfying 

• Capacity constraint: For all u, v e V, f(u. v ) < c(u. v), 

• Skew symmetry: For all u, v € V, f(u, v) = — f (v. u), 

• Flow conservation: For all u e V — {.s, t ), ^ /( u , v) = 0. 

veV 

Another way to think of flow conservation: 

^2 f(v,u) = J2 f^v) . 

veV:f(v,u)>0 veV:f(u,v)> 0 

total positive total positive 

flow entering it flow leaving u 
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“flow in = flow out” 

The differences between positive flow p and net flow /: 

• p(u, u) > 0 , 

• / satisfies skew symmetry. 


Equivalence of positive flow and net flow definitions 


Define net flow in terms of positive flow: 

• Define f(u, v) = p(u, v) — p(v, u). 

• Argue, given definition of p, that this definition of / satisfies capacity con¬ 
straint and flow conservation. 

Capacity constraint: 

p{u, u) < c(u, v ) and p(v, u) > 0 =>• p(u, v ) — p{v, u ) < c(u, v ) . 

Flow conservation: 

f(u, v) = ^ipiu,v) - p{v,u)) 

v&V veV 

= ^p(u,v) ~^p(v,u) 

veV veV 

= 0 . 


• Skew symmetry is trivially satisfied by this definition of f(u, v ): 
f(u,v ) = p(u, v) — p(v, u) 

= ~(p(v,u) - p(u,v)) 

= -f(v,u). 

Define positive flow in terms of net flow: 


• Define 


p(u, u) 


\fiu,v) if f iu, v) > 0 , 
|0 if fiu, v) <0 . 


• Argue, given definition of /, that this definition of p satisfies capacity con¬ 
straint and flow conservation. 


Capacity constraint: 


• If fiu, v) > 0: 

fill, v ) < c(u, v) => 0 < piu, v) < c(u, v ). 

• If fiu, v) < 0: 

0 = p(u, v ) < c(u, v). 
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Flow conservation: 

y P iu,v)~y piv, u) 

veV veV 



y p(u, v) + y p[u, v ) 

veV:f(u,v)> 0 veV:f(u,v)< 0 


y p(v,u)~ y P (v,u) 

veV:f(v,u)< 0 veV:f(v,u)>0 

= p(u,v) + 0 — 0— p(v,u) 

veV:f(u,v)> 0 veV:f(v,u)>0 

= y f iu, v) ^ fiv,u) 

i»gV:/(m,u)>0 veV:f(v,u)>0 

- y fiu, v) - y i-fiu, v )) 

veV:f(u,v)> 0 v€V:f(u, u)<0 

= yi(u,v) 

veV 

= 0 . 

fWe ’ll use net How, instead of positive How, for the rest of our discussion, in order 
to cut the number of summations down by half. From now on, we ’ll just call it 
“How”rather than “net How.”] 

Value of flow / = I/I = ^ fis, v) = total flow out of source. 

veV 

Consider the example below. [The cancellation possible in the previous example 
has been made here. Also, showing only Hows that are positive.] 



Cancellation with flow 

If we have edges iu, v) and (v, u), skew symmetry makes it so that at most one of 
these edges has positive flow. 

Say fiu, v) = 5. If we “ship” 2 units i; —> u, we lower fiu, u) to 3. The 2 units 
v —> u cancel 2 of the u — i; units. 

Due to cancellation, a flow only gives us this “net” effect. We cannot reconstruct 
actual shipments from a flow. 
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5 units u —> v 8 units u —>■ v 

n v same as . 

0 units v —> u 3 units i; u 

We could add another 3 units of flow u —> v and another 3 units v —> u, maintain¬ 
ing flow conservation. 

The flow from u to v would remain f(u, v) = 5, and /( v. u ) = —5. 

Maximum-flow problem: Given G, s, t, and c, find a flow whose value is maxi¬ 
mum. 

Implicit summation notation 

We work with functions, like /, that take pairs of vertices as arguments. 

Extend to take sets of vertices, with interpretation of summing over all vertices in 
the set. 

Example: If X and Y are sets of vertices, 

f(X, Y ) = EE fix, y ) • 

xeX yeY 

Therefore, can express flow conservation as f(u , V ) = 0, for all u e V —{ 5 , t}. 
Notation: Omit braces in sets with implicit summations. 

Example: f(s, V — s) — f(s, V ). Here, f(s, V — s) really means f(s, V — {5}). 

Lemma 

For any flow / in G = (V, E ): 

1. For all X C V,f(X,X) = 0, 

2. For all Xjcy, f(X, Y) = —f(Y, X), 

3. For all X, Y, Z c V such that XDY = 0, 
f(X U Y,Z) = f(X , Z) + f(Y, Z) and 
f (Z, X U Y) — f (Z, X) + f (Z, Y). 

[Leave on board.] 

Proof 

2. f(X,Y) = EE fix, y ) 

xeX yeY 

= EE — fiy,x ) (skew symmetry) 

ASX yeY 

= EE-zO’.n 

yeY xeX 

= —f iy, x) 

1. f(X,X) = - f(X,X) (part (2)) 

=>fiX,X) = 0 
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3. / (X U Y, Z) = E E f(v,z) 

veXUY zeZ 

= EE f(v,z) )+e(e (in7 = 0) 

vex \;eZ / veY \zeZ / 

= f(X,Z) + f(Y,Z) 

Other pai't is symmetric. ■ (lemma) 

Example of using this lemma: 

Lemma 

I/I = f(V, t). 

Proof First, show that f(V,V — s — t) = 0: 
f(u, V) = 0 for all me V — {s, t} 

=>• /(V — s — t,V) — 0 (add up f(u, V) for all u E V — { 5 , ?}) 

=>• f(V, V — s — t) = 0 (by lemma, paid (2)). 


Thus, 

I/I = f(s,V) 

(definition of |/|) 


= f (V, V) — f (V — s, V) 

(lemma, part (3)) 


> 

to 

1 

1 

II 

(lemma, paid ( 1 )) 


= f(V,V-s) 

(lemma, paid ( 2 )) 


= f(V,t) + f(V,V-s- 

t) (lemma, paid (3)) 


= f(V,t ) 

(from above) 

m (lemma) 

Cuts 



A cut (5, T) of flow network G = 
such that s E S and t e T. 

( V , E ) is a partition of V into S and T = V — S 

• Similar to cut used in minimum spanning trees, except that here the graph is 
directed, and we require s € S and t e T . 

For flow /, the net flow across cut (S, T) is f(S, T ). 



Capacity of cut (5, T ) is c(S, T ). 

A minimum cut of G is a cut whose capacity is minimum over all cuts of G. 
For our example: [Leave on board.] 
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Consider the cut S = {$, w, y }, T = {x, z, t}. 

f(S,T ) = f(w,x) + f(y,x) + f(y,z) 

= 2 H—1 + 2 
= 3 . 

c(S.T) = c(w, x) + c(y, z) 

= 2 + 3 
= 5 . 

Note the difference between capacity and flow. 

• Flow obeys skew symmetry, so f(y,x) = —f(x,y ) = —1. 

• Capacity does not: c(y,x ) = 0, but c{x, y) — 1. 

So include flows going both ways across the cut, but capacity going only S to T. 

Now consider the cut S = { 5 , w, x, y }, T = {z, t}. 

f(S,T ) = f(x,z) + f(x,t) + f(y,z) 

= - 1+2 + 2 
= 3 . 

c(S,T ) = c(x, z) + c(x, t) + c(y, z) 

= 2+3+3 
= 8 . 

Same flow as previous cut, higher capacity. 

Lemma 

For any cut (S, T), f(S,T) — \f\. 

[Leave on board.] 

Proof First, show that f(S — s,V) — 0: 

S- { 5 } c V - {s.t} . 

Therefore, 

f(S — s, V) = J2 

ueS—{s} 

= 0 (flow conservation and S — { 5 } c V — {s, t}) 

ueS—{s} 

= 0 . 

So, 

f(S, T) = f(S,V)-f(S,S) (lemma, part (3), S U T = V, S DT = 0) 

= f(S,V ) (lemma, part (1)) 

= f(s,V) + f(S — s,V) (lemma, part (3)) 

= f(s,V) (f(S-s,V) = 0) 

= I/I 


■ (lemma) 
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Corollary 

The value of any flow < capacity of any cut. 

[Leave on board.] 

Proof Let ( S , T ) be any cut, / be any flow. 

I/I = f(S,T) (lemma) 

ueS veT 

< EE c(u, v ) (capacity constraints) 

ueS veT 

— c(S,T) . ■ (corollary) 

Therefore, maximum flow < capacity of minimum cut. 

Will see a little later that this is in fact an equality. 


The Ford-Fulkerson method 

Residual network 

Given a flow / in network G = (V, E). 

Consider a pair of vertices u, v € V. 

How much additional flow can we push directly from u to w? 
That’s the residual capacity, 

Cf(u,V ) = c(u, v) — f (u, v) 

> 0 (since f(u, u) < c(u, v)). 

Residual network: Gj = (V, Ef), 

Ef — {(w, v) G V x V : Cf(u, v) > 0} . 

Each edge of the residual network can admit a positive flow. 
For our example: 



Every edge ( u , v) e Ef corresponds to an edge (u. v) e E or (v, u) e E (or both). 
Therefore, |E/| < 2 | E\. 

Given flows f\ and f 2 , the flow sum (\ + f 2 is defined by 
(/i + h)(u, v) = fi(u, v) + f 2 (u, v) 
for all u,v€ V. 
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Lemma 

Given a flow network G, a flow / in G, and the residual network Gf, let f 
be any flow in Gf. Then the flow sum f + f is a flow in G with value 

\f + r\ = \f\ + \n 

[See book for proof.] 

Augmenting path 

A path s t in Gf. 

• Admits more flow along each edge. 

• Like a sequence of pipes through which we can squirt more flow from .v to t. 
How much more flow can we push from .s to t along augmenting path pi 
Cf(p ) = min [cf(u, v) : ( u , v) is on p } . 

For our example, consider the augmenting path p — (s, w, y, z, x,t)- 
Minimum residual capacity is 1. 

After we push 1 additional unit along p: [Continue from G left on board from 
before.] 




Observe that Gf now has no augmenting path. Why? No edges cross the cut 
( {.s , w }, {x , y, z, t}) in the forward direction in Gf. So no path can get from .v to t. 

Claim that the flow shown in G is a maximum flow. 


Lemma 

Given flow network G, flow / in G, residual network Gf. Let p be an augmenting 
path in G f. Define f p : V x V —► R: 


f p (u, v) = 


c f(P ) 
-Cf(p) 
0 


if (; u , v ) is on p , 
if (v, u) is on p , 
otherwise . 


Then f p is a flow in Gf with value \f p \ = Cf(p) > 0. 
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Corollary 

Given flow network G, flow / in G, and an augmenting path p in Gf, define f p 
as in lemma, and define f : V x V —»■ R by f = f + f p . Then f is a flow in G 
with value |/'| = |/| + c f (p) > \f\. 

Theorem (Max-flow min-cut theorem) 

The following are equivalent: 

1. / is a maximum flow. 

2. / admits no augmenting path. 

3. |/| = c(S , T) for some cut ( S , T). 

Proof 

(1) => (2): If / admits an augmenting path p, then (by above corollary) would get 
a flow with value |/| + Cf(p) > |/|, so / wasn’t a max flow to start with. 

(2) =>• (3): Suppose / admits no augmenting path. Define 
S = {ref: there exists a path i^rin G/} , 

T = V - S . 

Must have ( el; otherwise there is an augmenting path. 

Therefore, (S, T) is a cut. 

For each u e S and v e T, must have f{u,v) = c(u,v), since otherwise 
( u , v) € Ef and then v e S. 

By lemma (f(S, T) = |/|), |/| = f(S, T) = c(S, T). 

(3) => (1): By corollary, |/| < c(S, T). 

|/| = c(S, T) =y f is a max flow. ■ (theorem) 


Ford-Fulkerson algorithm 

Ford-Fulkerson(F, E, s, t) 
for all (u, v ) e E 

do f[u, n] 4- f[v, u ] <— 0 
while there is an augmenting path p in Gf 
do augment / by Cf (p) 

Subtle difference between f[u, v | and f(u. v): 

• f(u, v ) is a function, defined on all u, v e V. 

• f[u, v | is a value computed by algorithm. 

• f[u, u] = f(u, v ) where (u, v) e E or ( v , u) e E. 

• f[u, u] is undefined if neither ( u, v) € E nor (v,u) e E. 
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Analysis: If capacities are all integer, then each augmenting path raises|/| by > 1. 
If max flow is /*, then need < |/*| iterations => time is 0(E |/*|). 

[Handwaying—see book for better explanation.] 

Note that this running time is not polynomial in input size. It depends on|/*|, 
which is not a function of | V | and \E\. 

If capacities are rational, can scale them to integers. 

If irrational, Ford-Fulkerson might never terminate! 


Edmonds-Karp algorithm 

Do Ford-Fulkerson, but compute augmenting paths by BFS of Gf. Augment¬ 
ing paths are shortest paths s t in Gf, with all edge weights = 1. 

Edmonds-Karp runs in 0(VE 2 ) time. 

To prove, need to look at distances to vertices in Gf. 

Let 8f(u, v) = shortest path distance u to v in Gf, with unit edge weights. 

Lemma 

For all v e V — {s, t], 8f(s, v) increases monotonically with each flow augmenta¬ 
tion. 

Proof Suppose there exists v e V — {s,t} such that there is a flow augmentation 
that causes 8f{s, v ) to decrease. Will derive a contradiction. 

Let / be the flow before the first augmentation that causes a shortest-path distance 
to decrease, f be the flow afterward. 

Let v be a vertex with minimum 8g(s, v ) whose distance was decreased by the 
augmentation, so 8g(s, v) < 8f(s, v ). 

Let a shortest path s to v in G p be s u —> v, so ( u , v) e Eg and 8g(s, u) = 
8f {s, u) + 1. (Or 8f(s, u) = 8g{s, v) — 1.) 

Since 8g(s, u) < 8g(s, v) and how we chose v, we have 8g(s, u ) > 8f(s, u). 

Claim 

(u, v) <£. Ef. 

Proof If (m, v ) G Eg then 

8f(s,v ) < 8f(s,u) + 1 (triangle inequality) 

< 8g(s,u)+ 1 
= 8f(s, v) , 

contradicting 8g(s, u) < <5 f(s, v). m (claim) 

How can (u, u) ^ Ef and ( u , v ) G Egl 
The augmentation must increase flow u to u. 

Since Edmonds-Karp augments along shortest paths, the shortest path s to u in Gf 
has v —► u as its last edge. 
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Therefore, 

8/(s, v ) = 8f(s, u) — 1 

< 8f(s, u) — 1 

= 8p (s, v) — 2 , 
contradicting 8p(s, v) < 8f(s, v ). 

Therefore, v cannot exist. ■ (lemma) 

Theorem 

Edmonds-Karp performs O(VE) augmentations. 

Proof Suppose p is an augmenting path and Cf(u, v) = Cf(p). Then call (; u , v) a 
critical edge in Gf, and it disappears from the residual network after an augmen¬ 
tation along p. 

> 1 edge on any augmenting path is critical. 

Will show that each of the \E\ edges can become critical <\V\/2—\ times. 

Consider it, v e V such that either (u , v) e E or (u, u) e E or both. Since aug¬ 
menting paths are shortest paths, when (u. v ) becomes critical first time, 8f(s, v ) = 
8f(s, u) + 1. 

Augment flow, so that (u, v) disppears from the residual network. This edge cannot 
reappear in the residual network until flow from u to v decreases, which happens 
only if (v, u) is on an augmenting path in Gp : 8p{s, u ) = 8p(s, v) + 1. (/' is 
flow when this occurs.) 

By lemma, 8f(s, v) < 8p(s, v) => 
i 8p(s,u) = 8p(s, v) + 1 

> 8/(s, u) + l 
= 8f(s, u) + 2 . 

Therefore, from the time (u, v ) becomes critical to the next time, distance of u 
from ,v increases by > 2. Initially, distance to u is > 0, and augmenting path can’t 
have s, u, and / as intermediate vertices. 

Therefore, until u becomes unreachable from source, its distance is < | V\ — 2 =$> 
u can become critical < (| V\ — 2)/2 = \ V\ /2— 1 times. 

Since 0(E) pairs of vertices can have an edge between them in residual graph, 
total # of critical edges is O(VE). Since each augmenting path has > 1 critical 
edge, have 0(VE) augmentations. ■ (theorem) 

Use BFS to find each augmenting path in 0(E) time => 0(VE 2 ) time. 

Can get better bounds. 

Push-relabel algorithms in Sections 26.4-26.5 give 0(V 3 ). 

Can do even better. 
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Maximum bipartite matching 

Example of a problem that can be solved by turning it into a flow problem. 

G = ( V , E ) (undirected) is bipartite if we can partition V = L U R such that all 
edges in E go between L and R. 



L R 


matching 



L R 


maximum matching 


A matching is a subset of edges M C E such that for all v € V, < 1 edge of M 
is incident on v. (Vertex v is matched if an edge of M is incident on it; otherwise 
unmatched). 


Maximum matching: a matching of maximum cardinality. (M is a maximum 
matching if \M\ > \M'\ for all matchings M '.) 

Problem: Given a bipartite graph (with the partition), find a maximum matching. 

Application: Matching planes to routes. 

• L = set of planes. 

• R = set of routes. 

• (it, v) e E if plane u can fly route v. 

• Want maximum # of routes to be served by planes. 

Given G, define flow network G' = (V', E'). 

• V' = VG{s,t}. 

• E' = {(s\ u ) : u e L} 

U {(u, v) : u e L, v e R, (u, v) e E} 

U {(u, t) : v G R} . 

• c(u, v ) = 1 for all (u, v) € E'. 
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Each vertex in V has > 1 incident edge =>• | E \ > |V|/2. 

Therefore, \E\ < \E'\ = \E\ + |V| < 3 \E\. 

Therefore, \E'\ — (-HE). 

Find a max flow in G'. Book shows that it will have integer values for all (u, v). 
Use edges that carry flow of 1 in matching. 

Book proves this gives maximum matching. 
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Solution to Exercise 26.1-4 

We want to prove the following lemma. 

Lemma 

For any flow / in G = (V, E ): 

1. For all X C V,f(X,X) = 0, 

2. For all X,Y C V, f(X, Y) = -f(Y, X), 

3. For all X,Y,ZC Fsuch that I n F = 0, 
f(X U Y, Z) — f(X, Z) + f(Y, Z) and 
f (Z, X U Y) — f (Z, X) + f(Z, Y). 

Proof 

2. 

f(X,Y ) = EE fix, y ) 

xeX yeY 

= EE — f(y,x ) (skew symmetry) 

xeX yeY 

= EE 

veK .isl 

= -/(T.20 

1. 

/(X,X) = -/(X,X) 

=>f(X,X) = 0 
3. 

/(X U F, Z) = E E f(v,z) 

veXUY zeZ 

= EE f(v,z)\ + EE f(v,z)) (inF = 0) 

vex \ z eZ / veF \zeZ / 

= f(X,Z) + f(Y,Z) 


The other part is symmetric. 


■ (lemma) 
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Solution to Exercise 26.1-6 

The flow sum /) + f 2 satisfies skew symmetry and flow conservation, but might 
violate the capacity constraint. 

We give proofs for skew symmetry and flow conservation and an example that 
shows a violation of the capacity constraint. Let f (u, v ) = (f\ + f 2 )(u, v). 

For skew symmetry: 

f(u,v) = fi(u, v) + f 2 (u, v) 

= —fi(v,u) — f 2 (v,u) (skew symmetry) 

= ~(fl(v, u) + f 2 (v, u)) 

= -f(v,u). 

For flow conservation, let u e V — (v, l): 

^ f(u, u) = £(/.(„, v) + f 2 (u, u)) 

veV veV 

= ^2fi(u,v) + ^2f 2 (u,v) 

veV veV 

= 0+0 (flow conservation) 

= 0 . 

For the capacity constraint, let V = {s\ t], E = {( 5 , ?)}, and c(s,t) = 1. Let 
/1 (.s, t) = f 2 (s, t) = 1. Then /) and f 2 obey the capacity constraint, but 
(/1 + f 2 ){u , v) = 2 , which violates the capacity constraint. 


Solution to Exercise 26.1-7 

To see that the flows form a convex set, we show that if f\ and f 2 are flows, then 
so is otf\ + (1 — ot)f 2 for all a such that 0 < a < 1 . 

For the capacity constraint, first observe that a < 1 implies that 1 — a > 0. Thus, 
for any u,v e V, we have 

a/i(w, v) + (1 — a)f 2 (u, v) > 0 • /i(w, v) + 0 • (1 - a)f 2 (u, v) 

= 0 . 

Since f\(u, v) < c(u, v) and f 2 (u, u) < c(u, v), we also have 
afi(u, v) + (1 — ot)f 2 (u , u) < ac(u, v) + (1 — a)c(u, n) 

= (a + (1 — a))c(u, v ) 

= c(u, v ) . 

For skew symmetry, we have f\ (it, v) = —fi(v, u) and f 2 (u, v) = —f 2 (v, u) for 
any u, v € V. Thus, we have 

afiiu, v) + (1 - a)f 2 (u, v) = -a/i(n, u) - (1 - a)f 2 (v, u) 

= -(a/i(n, u) + (1 — a)f 2 (v, u)) . 
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For flow conservation, observe that since f\ and f 2 obey flow conservation, we 
have J2vev /i(«> v) = 0 and J2vev / 2 OC r) = 0 for any u e V — {$, t}. Thus, 

^(a/i(w, v) + (1 - a)f 2 (u, v)) = a^f 1 (u,v) + (l-a)^f 2 (u,v) 

usV veV veV 

= a ■ 0 + (1 — a) • 0 


= 0 . 


Solution to Exercise 26.1-9 

Create a vertex for each corner, and if there is a street between corners u and v, 
create directed edges ( u , v) and (u, u). Set the capacity of each edge to 1. Let the 
source be corner on which the professor’s house sits, and let the sink be the corner 
on which the school is located. We wish to find a flow of value 2 that also has the 
property that f(u,v) is an integer for all vertices u and v. Such a flow represents 
two edge-disjoint paths from the house to the school. 


Solution to Exercise 26.2-4 


Cf{u, V ) + Cf(v, u) 


c(u, v) — f(u, V ) + c(v, u) — f(v, u ) 

(by definition) 
c(u, v ) + c(v, u) 

(by skew symmetry: f(u,v) — —f(v,u)) 


Solution to Exercise 26.2-9 

For any two vertices u and u in G, you can define a flow network G uv consisting 
of the directed version of G with all edge capacities set to 1, s — u, and t = v. 
( G uv has O(V) vertices—actually, | V|—and 0(E) edges, as required. We want 
all capacities to be 1 so that the number of edges crossing a cut equals the capacity 
of the cut.) Let /„„ denote a maximum flow in G uv . 

We claim that for any u 6 V, the edge connectivity k equals min \f uv \. We’ll 

veV-Ut) 

show below that this claim holds. Assuming that it holds, we can find k as follows: 

Edge-Connectivity ( G ) 
select any vertex u € V 

for each vertex v G V — [u] O | V | — 1 iterations 
do set up the flow network G uv as described above 
find the maximum flow f uv on G uv 
return the minimum of the | VI — 1 max-flow values: min \f uv \ 

veV-{u } 
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The claim follows from the max-flow min-cut theorem and how we chose capac¬ 
ities so that the capacity of a cut is the number of edges crossing it. We prove 
that k = min \f uv \, for any u € V by showing separately that k is at least this 

veV-Ui } 

minimum and that k is at most this minimum. 

• Proof that k > min \f uv \: 

veV-{u) 

Let m = min \f uv \. Suppose we remove only m — 1 edges from G. For 

veV-{u } 

any vertex v, by the max-flow min-cut theorem, it and v are still connected. 
(The max flow from u to v is at least in, hence any cut separating u from v has 
capacity at least m, which means at least m edges cross any such cut. Thus at 
least one edge is left crossing the cut when we remove m — 1 edges.) Thus every 
node is connected to u, which implies that the graph is still connected. So at 
least m edges must be removed to disconnect the graph—i.e., k > min \f uv \. 

veV-{u) 

• Proof that k < min \f uv \: 

ueV-{») 

Consider a vertex v with the minimum |/„„|. By the max-flow min-cut theorem, 
there is a cut of capacity \f uv \ separating it and v. Since all edge capacities are 1, 
exactly | f uv \ edges cross this cut. If these edges are removed, there is no path 
from u to v, and so our graph becomes disconnected. Hence k < min \f uv \. 

veV-[u ) 

• Thus, the claim that k = min for any u e V is true. 

v eV—{»} 


Solution to Exercise 26.2-10 

From the time (u, v) is a critical edge until it is again a critical edge, 8(s,u ) 
increases by at least 2, as shown in Theorem 26.9. Similarly, you can show 
that 8 (v , t) also increases by at least 2. Thus the length of the augmenting path 
s ^ u —»■ v t increases by at least 4 between times (u , v) is critical. Since the 
length of an augmenting path cannot exceed | V| — 1, (w, v) can be critical < | V| /4 
times. 

Edmonds-Karp terminates when there are no more augmenting paths, so it 
must terminate when there are no more critical edges, which takes at most 
(# edges)-(max # times each edge critical) < \Ef\ (\V\ /A) iterations. In general, 
\Ef\ < 2 \E\, so the number of iterations is at most (actually, fewer than) 
\E | | V| /2. But if we assume that G always has edges in both directions (i.e., 
(u , v) e E if and only if (v, u) G E), then \Ef\ < \E\, and the number of itera¬ 
tions is at most \E\ | F| /4. 


Solution to Exercise 26.3-3 

By definition, an augmenting path is a simple path 5 ^ t in the residual graph Gy. 
Since G has no edges between vertices in L and no edges between vertices in R, 
neither does the flow network G' and hence neither does Gy. Also, the only edges 
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involving s or t connect s to L and R to t. Note that although edges in G can go 
only from L to R, edges in G'f can also go from R to L. 

Thus any augmenting path must go 
s—>L— 

crossing back and forth between L and R at most as many times as it can do 
so without using a vertex twice. It contains s, t, and equal numbers of dis¬ 
tinct vertices from L and R —at most 2 + 2 • min(|L|, | R|) vertices in all. The 
length of an augmenting path (i.e., its number of edges) is thus bounded above by 
2 • min(|L|, |R|) + 1. 


Solution to Exercise 26.4-2 

Each time we call Relabel!??), we examine all edges (??, v ) e Ef. Since the 
number of relabel operations is at most 2\V\ — 1 per vertex, edge (u , v) will be 
examined during relabel operations at most 41V| — 2 = 0{V) times (at most 
2\V\ — 1 times during calls to Relabel(w) and at most 2\V\ — 1 times during 
calls to Relabel( n)). Summing up over all the possible residual edges, of which 
there are at most 2\E\ = 0(E), we see that the total time spent relabeling vertices 
is O(VE). 


Solution to Exercise 26.4-3 

We can find a minimum cut, given a maximum flow found in G = (V, E) by a 
push-relabel algorithm, in 0(V) time. First, find a height/? such that 0 < /? < |V| 
and there is no vertex whose height equals /? at termination of the algorithm. 
Since h[s ] = |V| and h[t] = 0, we need consider only |V| — 2 vertices. Since 
there are | V \ — 1 possible values for /?, we know that for at least one number in 
1,2, ...,|V| — 1, there will be no vertex of that height. Hence,/? is well defined, 
and it is easy to find in O(V) time by using a simple boolean array indexed by 
heights 1,2 ,..., |V| - 1. 

Let S = {?? e V : /?[??] > /?} and T = {v e^V : /?[u] < /?}. Because h\s\ = |V| > 
/?, we have s e S, and because /?[t | =0 < /?, we have t e T, as required for a cut. 

We need to show that /(??, v) = c(u, v), i.e., that (; u , v) Ef for all u e S and 
v € T. Once we do that, we have that f(S, T ) = c(S, T), and by Corollary 26.6, 
( S , T) is a minimum cut. 

Suppose for the puipose of contradiction that there exist vertices ?? e S and v e T 
such that (u,v) e Ef. Because /? is always maintained as a height function 
(Lemma 26.17), we have that /?[??] < /?[?;] + 1. But we also have h[v] </? < /?[??], 
and because all values are integer, /? [ ?’ | < /?[??] — 2. Thus, we have /?[??] < 
/? [ v ] + 1 < /?[??] — 2 + 1 = /?[??] — 1, which gives the contradiction that 0 < — 1. 
Thus, (S, T) is a minimum cut. 
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Solution to Exercise 26.4-6 

If we set h | ,s | = |V| — 2, we have to change our definition of a height function to 
allow h [s] = | V | — 2, rather than h\s\ = \ V\. The only change we need to make to 
the proof of correctness is to update the proof of Lemma 26.18. The original proof 
derives the contradiction that h[s] < k < \ V\, which is at odds with h [.s | = | V|. 
When h\s\ = \ V\ — 2, there is no contradiction. 

As in the original proof, let us suppose that we have a simple augmenting path 
(i>o, v\, ..., Vk), where vq = s and vk = t, so that k < |V|. How could (s, U |) 
be a residual edge? It had been saturated in Initialize-Preflow, which means 
that we had to have pushed some flow from to s. In order for that to have 
happened, we must have had h[v i] = /?[$] + 1. If we set h\s\ = | V| — 2, that 
means that h[v i] was | V\ — 1 at the time. Since then, h[v i] did not decrease, and so 
we have h[v\\ > | V\ — 1. Working backwards over our augmenting path, we have 
h[vk-i] < h[t ] + i for i = 0, 1, ..., k. As before, because the augmenting path is 
simple, k < |V|. Letting i — k — 1, we have h[v i] < h[t\ + k — 1 < 0 + | V\ — 1. 
We now have the contradiction that h[v i] > |L| — 1 and h\v\\ < | V| — 1, which 
shows that Lemma 26.18 still holds. 

Nothing in the analysis changes asymptotically. 


Solution to Problem 26-2 

a. The idea is to use a maximum-flow algorithm to find a maximum bipartite 
matching that selects the edges to use in a minimum path cover. We must show 
how to formulate the max-flow problem and how to construct the path cover 
from the resulting matching, and we must prove that the algorithm indeed finds 
a minimum path cover. 

Define G' as suggested, with directed edges. Make G into a flow network 
with source xq and sink y 0 by defining all edge capacities to be 1. G' is the 
flow network corresponding to a bipartite graph G" in which L = {x\ , ... x n }, 
R = {yj, ... y n }, and the edges are the (undirected version of the) subset of E 
that doesn’t involve xo or y 0 . 

The relationship of G to the bipartite graph G" is that every vertex i in G is 
represented by two vertices, v, and y,, in G". Edge (i, j) in G corresponds to 
edge ( Xi , Vj ) in G". That is, an edge (x,-, yj) in G" means that an edge in G 
leaves i and enters j. v, tells us about edges leaving i and y tells us about 
edges entering i. 

The edges in a bipartite matching in G" can be used in a path cover of G, 
because: 

• In a bipartite matching, no vertex is used more than once. In a bipartite 
matching in G", the fact that no v, is used more than once means that at most 
one edge in the matching leaves any vertex i in G, and similarly the fact that 
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no yi is used more than once means that at most one edge in the matching 
enters any vertex i in G. 

• In a path cover, no vertex appears in more than one path, hence at most one 
path edge enters each vertex and at most one path edge leaves each vertex. 

We can construct a path cover P from any bipartite matching M (not just a 
maximum matching) by moving from some jt,- to the matching y ; (if any), then 
from xj to its matching y k , and so on, as follows: 

1 . Start a new path containing a vertex i that has not yet been placed in a path. 

2. If Xi is unmatched, the path can’t go any farther; just add it to P. 

3. If Xi is matched to some y ; , add j to the current path. If j has already been 
placed in a path (i.e., though we’ve just entered j by processing yj, we’ve 
already built a path that leaves j by processing Xj), combine this path with 
that one and go back to step 1. Otherwise go to step 2 to process xj. 

This algorithm constructs a path cover because: 

• Every vertex is put into some path, because we keep picking an unused vertex 
from which to start a path until there are no unused vertices. 

• No vertex is put into two paths, because every Xi is matched to at most one yj , 
and vice versa. That is, at most one candidate edge leaves each vertex and at 
most one candidate edge enters each vertex. The normal path-building starts 
at or enters a vertex and then leaves it, building a single path. If we ever enter 
a vertex that was left earlier, it must have been the start of another path, since 
there are no cycles, and we combine those paths so that the vertex is entered 
and left on a single path. 

Every edge in M is used in some path because we visit every y , and we incor¬ 
porate the single edge, if any, from each visited y. Thus there is a one-to-one 
correspondence between edges in the matching and edges in the constructed 
path cover. 

We now show that the path cover P constructed above has the fewest possible 
paths when the matching is maximum. 

Let / be the flow coiresonding to the bipartite matching M. 

| V | = (# vertices in p) (every vertex is on exactly 1 path) 

peP 

= ^ (1 + # edges in p) 

peP 

= ^ 1 + ^ (# edges in p) 

psP peP 

= |P| + (# edges in M ) (by 1-to-l correspondence) 

= l + l + l/l (Lemma 26.10) . 

Thus for the fixed set V in our graph G,\P \ (the number of paths) is minimized 
when the flow / is maximized. 

Thus the overall algorithm is as follows: 

• Use Ford-Fulkerson to find a maximum flow in G', hence a maximum 
bipartite matching M in G". 

• Construct the path cover as described above. 
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Time: 0(VE) total 

• 0(V + E) to set up G' 

• 0(V E) to find the maximum bipartite matching 

• 0(E) to trace the paths, because each edge e M is traversed only once and 
there are 0(E) edges in M. 

b. The algorithm does not work if there are cycles. 

Consider a graph G with 4 vertices, consisting of a directed triangle and an 
edge pointing to the triangle: 

E = {(1,2), (2, 3), (3,1), (4,1)} 

G can be covered with a single path: 4 —> 1 —> 2 —> 3, but our algorithm 
might find only a 2-path cover. 

In the bipartite graph G 1 , the edges (v,, yj) are 
(x u y 2 ), (x 2 , yi), (*3, Ti), ( m , Ji) • 

There are 4 edges from an x, to a y ; , but 2 of them lead to yi, so a maximum 
bipartite matching can have only 3 edges (and the maximum flow in G is 3). In 
fact, there are 2 possible maximum matchings. It is always possible to match 
x\ —> y 2 and x 2 —> y 2 , but then either x 2 —> y\ or x 4 —> y\ can be chosen, but 
not both. 

The maximum-flow found by one of our max-flow algorithms could find the 
flow corresponding to either of these matchings, since both are maximal. But 
one of the matchings doesn’t contain an edge to or from vertex 4, so given 
that matching, our path algorithm is forced to produce 2 paths, one of which 
contains just the vertex 4. 


Solution to Problem 26-4 

a. Just execute one iteration of the Ford-Fulkerson algorithm. The edge (u , v) in E 
with increased capacity ensures that the edge (it, v) is in the residual graph. So 
look for an augmenting path and update the flow if a path is found. 

Time: 0(V + E) = 0(E) if we find the augmenting path with either depth- 
first or breadth-first search. 

To see that only one iteration is needed, consider separately the cases in which 
(u, v) is or is not an edge that crosses a minimum cut. If (u, v) does not cross a 
minimum cut, then increasing its capacity does not change the capacity of any 
minimum cut, and hence the value of the maximum flow does not change. If 
(u, v) does cross a minimum cut, then increasing its capacity by 1 increases the 
capacity of that minimum cut by 1, and hence possibly the value of the maxi¬ 
mum flow by 1. In this case, there is either no augmenting path (in which case 
there was some other minimum cut that (u, v) does not cross), or the augment¬ 
ing path increases flow by 1. No matter what, one iteration of Ford-Fulkerson 
suffices. 
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b. Let / be the maximum flow before reducing c(u, v). 

If f(u , v) = 0, we don’t need to do anything. 

If f (IF V) > 0, we will need to update the maximum flow. Assume from now 
on that f(u, v ) > 0, which in turn implies that f(u,v ) > 1. 

Define f'(x, y) = f(x, y ) for all x, y e V, except that f(u, v) = f(u, v) — 1. 
Although f obeys all capacity contraints, even after c(u, v) has been reduced, 
it is not a legal flow, as it violates skew symmetry and flow conservation at u 
and v. f has one more unit of flow entering u than leaving it, and it has one 
more unit of flow leaving i; than entering v. 

The idea is to try to reroute this unit of flow so that it goes out of u and into v 
via some other path. If that is not possible, we must reduce the flow from s to u 
and from v to t by one unit. 

Look for an augmenting path from u to v (note: not from s to t). 

• If there is such a path, augment the flow along that path. 

• If there is no such path, reduce the flow from s to u by augmenting the flow 
from u to s. That is, find an augmenting path u s and augment the flow 
along that path. (There definitely is such a path, because there is flow from s 
to u .) Similarly, reduce the flow from v to l by finding an augmenting path 
t v and augmenting the flow along that path. 

Time: 0(V + E) — 0(E) if we find the paths with either DFS or BFS. 


Solution to Problem 26-5 

a. The capacity of a cut is defined to be the sum of the capacities of the edges 
crossing it. Since the number of such edges is at most | E |, and the capacity of 
each edge is at most C, the capacity of any cut of G is at most C\E\. 

b. The capacity of an augmenting path is the minimum capacity of any edge on the 
path, so we are looking for an augmenting path whose edges all have capacity at 
least K. Do a breadth-first search or depth-first-search as usual to find the path, 
considering only edges with residual capacity at least K. (Treat lower-capacity 
edges as though they don’t exist.) This search takes 0(V + E) = 0(E) time. 
(Note that | V | = 0(E) in a flow network.) 

c. Max-Flow-By-Scaling uses the Ford-Fulkerson method. It repeatedly aug¬ 
ments the flow along an augmenting path until there are no augmenting paths 
of capacity greater > 1. Since all the capacities are integers, and the capac¬ 
ity of an augmenting path is positive, this means that there are no augmenting 
paths whatsoever in the residual graph. Thus, by the max-flow min-cut theorem, 
Max-Flow-By-Scaling returns a maximum flow. 

d. • The first time line 4 is executed, the capacity of any edge in Gj equals its 

capacity in G, and by part (a) the capacity of a minimum cut of G is at most 

C \E\. Initially K = 2 L1 § CJ , hence 2K = 2 ■ 2 L1 § CJ = 2 L1 s CJ+1 > 2* c = C. 

So the capacity of a minimum cut of Gf is initially less than 2 K \ E\. 
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• The other times line 4 is executed, K has just been halved, so the capacity 
of a cut of G f is at most 2 K \E\ at line 4 if and only if that capacity was at 
most K | E | when the while loop of lines 5-6 last terminated. So we want 
to show that when line 7 is reached, the capacity of a minimum cut of Gf is 
most K \ E\. 

Let G f be the residual network when line 7 is reached. 

There is no augmenting path of capacity > K in Gf 
=>• max flow f in Gf has value \f'\ < K \ E\ 

=>■ min cut in Gf has capacity < K\E\ 

e. By pai't (d), when line 4 is reached, the capacity of a minimum cut of Gf is at 
most 2 K \ E\, and thus the maximum flow in Gf is at most 2 K \ E\. 

By an extension of Lemma 26.2, the value of the maximum flow in G equals 
the value of the current flow in G plus the value of the maximum flow in Gf. 
(Lemma 26.2 shows that, given a flow / in G, every flow f in G f induces a 
flow / + f in G; the reverse claim, that every flow / + /' in G induces a 
flow /' in Gf, is proved in a similar manner. Together these claims provide the 
necessary correspondence between a maximum flow in G and a maximum flow 
in Gf.) Therefore, the maximum flow in G is at most 2 K \E\ more than the 
current flow in G. Every time the inner while loop finds an augmenting path 
of capacity at least K, the flow in G increases by > K. Since the flow cannot 
increase by more than 2 K |£j, the loop executes at most (2 K \E\)/K — 2\E\ 
times. 

/. The time complexity is dominated by the loop of lines 4-7. (The lines out¬ 
side the loop take 0(E) time.) The outer while loop executes 0(\g C) times, 
since K is initially 0(C) and is halved on each iteration, until K < 1. By 
paid (e), the inner while loop executes 0(E) times for each value of K ; and by 
paid (b), each iteration takes 0(E) time. Thus, the total time is 0(E 2 lg C). 
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Chapter 27 overview 

Sorting networks 

An example of parallel algorithms. 

We’ll see how, if we allow a certain kind of parallelism, we can sort in Ode 2 n) 
“time.” 

Along the way, we’ll see the 0-1 principle, which is a great way to prove the cor¬ 
rectness of any comparison-based sorting algorithm. 


Comparison networks 

Comparator 

x -•-min< [x,y) 

y -1- mnx(x,y) 

Works in 0(1) time. 



Wires go straight, left to right. 

Each comparator has inputs/outputs on some pair of wires. 

Claim that this comparison network will sort any set of 4 input values: 
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• After leftmost comparators, minimum is on either wire 1 (from top) or 3, 
maximum is on either wire 2 or 4. 

• After next 2 comparators, minimum is on wire 1, maximum on wire 4. 

• Last comparator gets correct values onto wires 2 and 3. 

Running time = depth = longest path of comparators. (3 in previous example.) 

• Think of dag of comparators that depend on each other. Depth = longest path 
through dag (counting vertices, not edges). 



• Depth max # of comparators attached to a single wire. 
• In the above example, that is 2. 

Selection sorter 


To find max of 5 values: 



Can repeat, decreasing # of values: 



Depth: D(n) = D(n — 1) + 2 
D(2) = 1 
=>■ D(n) = 2n — 3 
= 0 («) . 

If view depth as “time,” parallelism gets us a faster method than any sequential 
comparison sort! 

Can view the same network as insertion sort: 



[This material answers Exercise 27.1-6, showing that the network in Figure 27.3 
does correctly sort and showing its relationship to insertion sort.] 
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Zero-one principle 


How can we test if a comparison network sorts? 

• We could try all n ! permutations of input. 

• But we need to test only 2" permutations. This is many fewer than all n ! per¬ 
mutations. 


Theorem (0-1 principle) 

If a comparison network with n inputs sorts all 2' sequences of 0’s and l’s, then it 
sorts all sequences of arbitrary numbers. 

Note: In practice, we don’t even have to reason about “all 2’ sequences”—instead, 
we look at the patterns of 0’s and l’s—we’ll see later how. 


Lemma 

If a comparison network transforms 

a = (a i ,a 2 ,..., a n ) into b = (b\,b 2 ,... ,b n ) , 

then for any monotonically increasing function /, it transforms 

f(a ) = (f(a x ), f(a 2 ), .... f(a n )) into f(b) = (f(b \), f(b 2 ), f(b m )) . 


Sketch of proof 


m- 

fiy) 


minfc,y) 

. max(x,y) 

— min (fix), fly)) =/( minfcy)) 

—ma x(f(x),fly)) =J( max(x,y)) 


since/is monotonically increasing 


Then use induction on comparator depth. 


■ (lemma) 


Proof (of 0-1 principle) 

Suppose that the principle is not true, so that an n -input comparison network sorts 
all 0-1 sequences, but there is a sequence (ai, a 2 , ..., a n ) such that a, < aj but a, 
comes after aj in the output. 

Define the monotonically increasing function 


fix) = 


if x < «, , 
ifx > a i . 


By the lemma, if we give the input (/(fli), f(a 2 ),..., f(a„)), then the output will 
have /(a,) after f(aj ): 


/(«,) = 1 

m= 0 


But that’s a 0-1 sequence that is sorted incorrectly, a contradiction. ■ (theorem) 
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A bitonic sorting network 


Constructing a sorting network 


Step 1: Construct a “bitonic sorter.” It sorts any bitonic sequence. 

A sequence is bitonic if it monotonically increases, then monotonically decreases, 
or it can be circularly shifted to become so. 

Examples: (1,3, 7, 4, 2} 

(6,8,3, 1,2,4) 

(8,7,2, 1,3,5) 

Any sequence of 1 or 2 numbers 


For 0-1 sequences—which we can focus on—bitonic sequences have the form 


0' V o k 


1‘ ()/ \ k 


0 

1 

0 


1 

0 

1 


Half-cleaner: 


bitonic 



0 

0 

0 

0 

1 

0 

1 

1 


clean 


bitonic 



1 

1 


V o 


0 

0 

1 

0 

1 

1 

1 

1 


bitonic 


clean 


Depth = 1. 


Lemma 

If the input to a half-cleaner is a bitonic 0-1 sequence, then for the output: 

• both the top and bottom half are bitonic, 

• every element in the top half is < every element in the bottom half, and 

• at least one of the halves is clean —all 0’s or all l’s. 


Skipping proof—see book (not difficult at all). 
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Bitonic sorter: 



bitonic 


sorted 


bitonic 


0 

0 

1 

1 

1 

0 

0 

0 


-< 


- i 

0 






0 . 


0 , 





0 

, 0 l 





1 _ 

1 , 





0 


0 I 




1 . 


1 , 



1 

_L_I 


0 

0 

0 

0 

0 

1 

1 

1 


\ sorted 


Depth: D(n) = D(nf 2) + 1 
D( 2) = 1 
=> D(n) = lg n . 

Step 2: Construct a merging network. 

It merges 2 sorted sequences. 

Adapt a half-cleaner. 

Idea: Given 2 sorted sequences, reverse the second one, then concatenate with the 
first one => get a bitonic sequence. 

Example: 

X = 0011 Y = 0111 
Y r = 1110 

XY r = 00111110 (bitonic) 

So, we can merge X and Y by doing a bitonic sort on X and Y R . 

How to reverse 7? Don’t! 

Instead, reverse the bottom half of the connections of the first half-cleaner: 


f M 

o_ 


1 

._n 

/ O 

x, { 

’ , \ i_ 



> bitonic 

sorted 1 

V 1 



L. o j 

r n_ 



T u J 

< 

f i- 



;) 

y ’A i _ 



i "j> clean 

sorted 1 

l 1 — 
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Full merging network: 


X, 

sorted 


Y, 

sorted 




— 

0 

... 0 





1 . 


1 





_ 0 

0 





1 . 1 





1 

. 1 





1 


1 




- 1 ' 

1 

L 


0 

0 

0 

1 

1 

1 

1 

1 


} sorted 


) 


Depth is same as bitonic sorter: lg n. 


Step 3: Construct a sorting network. 

Recursive merging—like merge sort, bottom-up: 



sorted 




1 


0 




. 0 

IT 1 






1 , 


1 







0 


0 



i_ : 


1 


0 






_ 0 

..I 0 






0 


1 




v. ' ^ 


mergers mergers merger 


Depth: Din) 
D( 2) 
=> D{n) 


D{n 12) + lg n 

1 

0 (lg 2 ft) 


(Exercise 4.4-2) . 
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Use 0-1 principle to prove that this sorts all inputs. 
Can we do better? 

Yes—the AKS network has depth 0{\gn). 

• Huge constant—over 1000. 

• Really hard to construct. 

• Highly impractical—of theoretical interest only. 



Solutions for Chapter 27: 
Sorting Networks 


Solution to Exercise 27.1-4 

Consider any input element x. After 1 level of the network, ;c can be in at most 2 
different places, in at most 4 places after 2 levels, and so forth. Thus we need at 
least lg n depth to be able to move x to the right place, which could be any of the n 
(_ 2 1 g") outputs. 


Solution to Exercise 27.1-5 

Simulation of any sorting network on a serial machine is a comparison sort, 
hence there are Q(n \gn) comparisons/comparators. Intuitively, since the depth 
is £2(lg«) and we can perform at most n/2 comparisons at each depth of the net¬ 
work, this <2 (n lg n) bound makes sense. 


Solution to Exercise 27.1-7 

We take advantage of the comparators appearing in sorted order within the network 
in the following pseudocode. 

for / <— 1 to n 

do d[i\ <— 0 

for each comparator (i, j) in the list of comparators 
do d\i | <r- d[j ] <r- max(d[/], d[j]) + 1 
return max!<,<„ d\i | 

This algorithm implicitly finds the longest path in a dag of the comparators (in 
which an edge connects each comparator to the comparators that need its outputs). 
Even though we don’t explicitly construct the dag, the above sort produces a topo¬ 
logical sort of the dag. 

The first for loop takes 0(n) time, the second for loop takes 0(c) time, and com¬ 
puting the maximum d\i \ value in the return statement takes 0(«) time, for a total 
of 0(n + c) = 0(n + c) time. 
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Solution to Exercise 27.2-2 


In both parts of the proof, we will be using a set {/), f 2 . f„-\} of monotoni- 

cally increasing functions, where 


fk(x) = 


if x < k , 
if.r > k . 


For convenience, let us also define the sequences si, j 2 , • • •» s„-i, where Sj is the 
sequence consisting of n — i l’s followed by i 0’s. 


=>•: Assume that the sequence (n, n— 1,..., 1) is correctly sorted by the given com¬ 
parison network. Then by Lemma 27.1, we know that applying any monotonically 
increasing function to the sequence s — (n, n — 1 , ..., 1 ) produces a sequence that 
is also correctly sorted by the given comparison network. For k = 1, 2, ..., n — 1, 
when we apply the monotonically increasing function f k to the sequence s, the 
resulting sequence is s k , which is correctly sorted by the comparison network. 


<t= : Now assume that the comparison network fails to correctly sort the input 
sequence (n, n — 1, ..., 1). Then there are elements i and j in this sequence 
for which i < j but i appears after j in the output sequence. Consider the input 
sequence /,(n — 1), ..., /,(1)), which is the same as the sequence s,. By 

Lemma 27.1, the network produces an output sequence in which jj(i) appears 
after /,(/). But /,(/) = 0 and f,(j) = 1, and so the network fails to sort the input 
sequence y. 


Solution to Exercise 27.5-1 


Sorter[«] consists of (n/4) lg 2 n + (n/ 4) lg n = (-)(n lg 2 n) comparators. To see 
this result, we first note that Merger[«] consists of (n/2) lg n comparators, since 
it has lg n levels, each with n/2 comparators. 

If we denote the number of comparators in SORTER[n| by C(n), we have the re¬ 
currence 


C (n) = 


0 

2C(n/2) + ^ lg n 


if n = 1 , 

if n = 2 k and k > 1 . 


We prove that C (n) = (n/4) lg 2 n + (n/4) lg n by induction on k. 

Basis: When k = 0, we have n = 1. Then (n/4) lg 2 n + (n/4) lg n = 0 = C (n). 

Inductive step: Assume that the inductive hypothesis holds for k — 1, so that 
C (n/2) = (n/8)lg 2 (n/2) + (n/8)lg(n/2) = (n/8)(lgn - l) 2 + (n/8)(lgn - 1). 
We have 
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C(n ) = 2C(n/2) + ^\gn 

/ n t n \ n 

= 2(-(lg«-l)- + -(lg»-l)j + -lg» 

n 2 n n n n n 

= -l g n--lgn + - + -lgn-- + -lgn 

n , 2 n 
= ^ lg n + - lg n . 


Solution to Exercise 27.5-2 


We show by substitution that the recurrence for the depth of Sorter[«], 


D(n) = 


0 


if n — 1 , 


[ D(n/2 ) + lg n if n = 2 k and k > 1 , 
has the solution D(n) = (lgn)(lgn + l)/2. 

Basis: When k = 0, we have n = 1. Then (lg«)(lgn + l)/2 = 0 = D(l). 

Inductive step: Assume that the inductive hypothesis holds for k — 1, so that 
D(n/2) = (lg(n/2))(lg(«./2) + l)/2 = (lg n - l)(lgn)/2. We have 
D(n) = D{n/2) + \gn 
(lg n - 1)(lg n) 


+ lg n 


lg 2 n 


lg n 


lg 2 n + lg n 


+ lg n 


(lg n ) (lg n + 1) 


2 
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